1 Introduction

Online deception attacks have attracted significant attention from researchers and extensive research has been conducted for more than twenty years. Nowadays, the rapid proliferation of email, web-based technologies, smart communication devices, and social media and the expanded utilization of artificial intelligence (AI) has assisted cybercriminals to generate more sophisticated deception methods and generate security threats that are increasingly difficult to detect. Published research suggests that such attacks, especially through AI technology tools, are far more professionally exploited compared to what becomes publicly disclosed [1, 2]. As the complexity of the cybersecurity domain rises, it is becoming more difficult to detect, analyze, and regulate fraudulent events [1]. While technological solutions can reduce the number of online deceptions, purely technical defense solutions can never be perfect.

Adequate defense against social engineering cyberattacks requires, among others, a deeper understanding of the interplay among human emotional and cognitive factors towards cyberattacks susceptibility. Simultaneously, efforts should be made to minimize or mitigate the resulting damage on both personal and enterprise levels [3]. Human decision-making serves as the final barrier against cyberthreats, prompting significant interest in investigating and comprehending whether and how human cognitive and emotional conditions generate neural processes that can be harnessed to reason about and potentially detect the underlying presence of a cyberattack [4]. As such, there is a particular research interest to leverage on brain computer interfaces and gaze-based apparatus, to early detect the possibility of a cyberattack and assist users for effective decision-making.

Based on recent analyses, phishing attacks are still the most widely and easy to perform cyberattacks, revealing the existence of over two (2) million phishing sites as of January 2021 [5], and it has become the scourge of the modern era, affecting a wide social spectrum of all classes and ages, in multiple methods and forms. Phishing is the practice of deceiving, pressuring, or manipulating people into sending sensitive information or assets to the wrong people by masquerading a message as legitimate. It refers to social engineering practices, by email, phone calls, or social media and text messages, pretending to be from trusted service providers, aiming to induce end-users to reveal personal information, such as passwords, pin codes, credit card numbers, and similar sensitive data. Phishing can be defined as a threat, which by virtue of social engineering techniques and/ or other technological or non-technological means, facilitates the attacker to retrieve personal information from her victims, causing them monetary or other damage because of this information leakage [6]. Albeit electronic phishing appeared almost two decades ago, similar techniques can be traced back at least to the nineteenth century. Exploring the history of pre-internet swindling schemes helps draw a bigger picture of the current phishing and scamming methods, with one of the most common nineteenth-century techniques being the “Spanish Prisoner Letter” [7]. Usually, phishing attacks are targeting a high number of “victims” and hence effective communication methods with extensive outreach are often preferred by the attackers. For example, they can be sent over e-mail, SMS (smishing), or leverage voice (vishing), and/ or social media channels (Facebook, Twitter, etc.) to deploy the attack. Typically, the intent is to steal the credentials or financial information from the users (aka identity thefts and cat-phishing). Identity theft involves stealing private information such as credit cards number, tax or social security numbers, name, address, date of birth, or other similar sensitive information aiming for the direct financial gain of the attacker, whereas cat phishing relies mainly on impersonating someone to ask victims to send money to the attacker [8].

Phishing has attracted significant attention from researchers and extensive work has been conducted with an exponential increase of phishing-related research papers during the last twenty (20) years (Fig. 1).

Fig. 1
figure 1

Phishing research published during the last twenty (20) years indicating an exponential growth in number of scientific papers related to “phishing” research domain (results derived from ACM, Scopus, IEEE, Springer, Web of Science with search criteria: “phishing”)

The topic of mitigating the effects of a phishing attack can be approached from several perspectives [9]. Numerous works on phishing mitigation tactics have been proposed that can be categorized under three main topics: i) phishing filtering, ii) phishing detection support, and iii) user education and training. Phishing filtering approaches leverage machine learning to detect and filter out malicious acts prior to being received by the end-users [10]. Examples include machine learning frameworks for scrutinizing webpages [11], phishing attack detection mechanisms based on natural language processing and machine learning [12], machine learning-based anti-phishing systems based on Uniform Resource Locator (URL) features [13], stacking models using URL and HTML features for phishing Web-page detection [14], and methods for detecting phishing certificates using certificate transparency logs [15].

Phishing detection support tools typically assist users by providing them with information about a webpage or URL through Web browser warnings, intelligent agents, browser plugins, etc. Example works include the one of Althobaiti et al. [16] who investigated different approaches on how to more effectively express complex URLs and Web hosting concepts to users in a comprehensible way, the research of Yang et al. (2017) [17] who designed security warnings based on Website traffic ranks, the study of Althobaiti et al. (2018) [18] who proposed an intelligent agent that provides information about URLs with regards to the existence of misspellings, non-ASCII characters and redirection, and the research of Volkamer et al. (2017) [19] who proposed an extension for email clients that visually highlights the domain of a URL in an email and disables its hyperlink for three (3) seconds to make the users aware about the URL. Studies have also examined the effects of users’ characteristics on susceptibility to phishing attacks, aiming to produce knowledge for designing personalized phishing detection support tools. Frequently studied user characteristics include user demographics such as gender and age, disabilities, technicalities, and digital inequalities. However, user demographics have not been proven to be decisive indicators on phishing susceptibility. Gender is often included as a demographic variable within phishing studies; however, results regarding its impact on phishing detection are again controversial, while some studies have concluded that a statistically significant relationship between the gender and phishing susceptibility does exist [20,21,22,23,24,25,26,27], others have found no such connection [28,29,30,31,32].

Regarding the age differences, there results also controversial: Sheng et al. [24] at their study conclude that people between the ages of 18 and 25 are more susceptible to phishing than other age groups due to lower level of education, fewer years on the Internet, less exposure to training materials, and less of an aversion to risks. On the other end, Robinson et al. [25] implies that older adults exhibit disparities associated with access, usage, and skills so they are more likely to encounter challenges with technostress including managing passwords, maintaining online safety [25], and Lin et al. [14] at their study conclude that while young compared to older users showed greater susceptibility to scarcity (a belief that something is in short supply or almost gone), older compared to young users showed greater susceptibility to reciprocation (a need to fulfill repayment for a good or service received). Also, other factors may affect people’s susceptibility to phishing like the differences in computer literacy based on which it has been shown that expert users tend to be more sensitive in detecting phishing emails [33], the users’ cognitive abilities, according to which people with reflective reasoning may be in a better position to differentiate phishing emails compared to users with intuitive thinking [34] and the individual personality traits of the recipients of the attack, according to which users with higher conscientiousness are more likely to become phishing attack victims [21]. At the same time, studies indicate that the user disabilities, like people with autism are no parring or exceeding the average performance in the identification of the phishing websites [35] and blind people demonstrate robust reading strategies for identifying phish [36].

End-User education and training approaches as well as related frameworks have been proposed to support the decision-making of end-users towards more effective recognition of suspicious emails. Example works include dedicated training sessions during which the users are informed about the various existing phishing attacks and mitigation approaches [37]. Other training approaches aim to integrate learning and training aspects within the daily routines of the users (e.g., receive training in case the user is a victim of a phishing attack), rather than conducting a training beforehand [38]. Nevertheless, evidence suggests that educating and training end-users also entails several challenges and weaknesses. For example, studies have shown that individuals tend to forget the trained material and still fall for phishing attacks [39], while trainings that are integrated in daily routines are costly, given that administrations need to prepare and distribute simulated but at the same time realistic and up-to-date phishing attacks [14, 40].

1.1 Research motivation and contribution

Phishing research aims to design countermeasures against malicious attempts of trying to steal confidential information from people and to protect them from falling victims of it. Research shows that some users are more likely to disclose information than others when faced with an online scam and that personal characteristics are an important factor in mitigating the risk of phishing [9]. Considering the impact of phishing attacks on “human susceptibility,” it is crucial to develop a comprehensive understanding of the process and characteristics of phishing itself, as well as the underlying human cognitive and emotional processes. Such scientific knowledge might facilitate the design of suitable and personalized anti-phishing security frameworks that consider the individual behavior of end-users, with a specific focus on emotional, cognitive, and personal factors. By leveraging on end-users’ physiological responses reflected in brain- and/or gaze-based behavior during phishing tasks, it is possible to expedite the advancement of novel personalization methods and frameworks aiming to early differentiate between individuals engaged in malicious phishing activities and those who are not. This enables the implementation of effective countermeasures to enhance human decision-making capabilities.

In pursuit of this objective, our study delves into the potential efficacy of electroencephalography (EEG) devices in the context of phishing incidents. These devices enable the monitoring of neural activities and cognitive responses, thereby facilitating the inference of correlated brain reactions occurring concurrently during phishing encounters. Furthermore, we explore the utilization of eye-tracking technology, which captures spontaneous responses unaffected by conscious thought. This approach provides an alternative perspective for a comprehensive cognitive assessment of victims’ reactions and stimuli in phishing scenarios. The integration of brain-computer interfaces and eye-tracking technologies has the potential to advance our understanding of cognitive processes and correlated brain responses beyond what either technology can offer independently. By combining these two modalities, we can benefit from improved temporal resolution and complementary information. The vision of this endeavor is to utilize real-time “brain-eye” measures, integrating brain and eye-tracking data, to develop a mechanism that evaluates the trustworthiness of a user’s response while gaining insights into the role of neural measures and cortical activity in defending against phishing attacks. This holistic approach enables us to uncover the underlying mechanisms at work, advancing our understanding of how cognitive processes and physiological responses contribute to effective protection against phishing incidents.

Although several reviews have been identified in the research areas of phishing, electroencephalography (EEG), and eye-tracking when examined individually [41,42,43,44,45,46], by the time of writing of this paper, we did not identify any systematic literature survey approaching the topic of phishing from both an EEG and eye-tracking perspective. Our intent in this paper is to identify experiments that were implemented in this area to investigate the correlation of EEG and eye-tracking against the most used phishing types. By analyzing the characteristics of these experiments, we aim to identify which brainiac and cognitive areas does phishing activate, expand our research to other types of phishing, correlate the brain’s reaction to these phishing types, and make cognitive models which, through AI, can help improve on tactics for anti-phishing.

To this end, this study presents a survey spanning the last ten (10) years, with the aim of identifying phishing research papers that have considered in their study the experimental use of electroencephalography and/or gaze-based interaction, in order to set the stage for future anti-phishing frameworks that leverage collected EEG and gaze-based data to combat phishing. Our goal is to discover experiments conducted in the area of phishing types, determine which brain and cognitive areas phishing activates, extend our research to other types of phishing, correlate the brain’s reaction to them, and create cognitive models that can help improve anti-phishing strategies by analyzing the characteristics of these experiments and mining their results.

The rest of the paper is organized as follows: in Section 2, we provide the theoretical background of our work, Section 3 gives a presentation of the electroencephalography and eye-tracking apparatus, Section 4 presents our research methodology and questions, Section 5 provides a systematic analysis of the existing phishing research related to our research motivation, Section 6 provides discussion and research suggestions, and in Section 7, we conclude with the main findings, further exploration of the applicability and generalizability of our findings and limitations of our work.

2 Theoretical background

2.1 Phishing classification

Phishing is a social engineering technique that through the use of various methods and techniques and aims at exploiting weaknesses of system processes with the aim to influence the end-users to reveal sensitive personal information (e.g., email address, username, password, or financial information) which subsequently can be used by the attacker to the detriment of the victim [47]. The logic of this terminology is that an attacker uses “bait” to lure the victim and then “ph-f-ishes” for their personal information [48].

Historically, the first instance of phishing was reported in 1995 when attackers attempted to convince victims to share their AOL account details [49]. The first use of the word phishing in printed media appeared in an article by Ed Stansel writing for the Florida Times Union and published on March 16th, 1997. The term phishing is derived from the word “fishing,” spelt using what is commonly known as Haxor, which replaces Standard English characters with other ASCII characters: a typical rule in Haxor is that the letter “f” is converted to “ph.” The origin of the word phishing is considered to be an extension to the word “phreaking” [50]. The use of “ph” in place of the “f” in the spelling of the term was used to link phishing scams with phreaks, which were some of the earliest hackers [51].

Phishing has a significant impact both in social and economic terms: Verizon’s Data Breach Investigation Report for 2022 reports that 82% of the breaches involved the Social Engineering sector, with phishing contributing to more than 65% to it [52]. In the APWG (Anti Phishing Working Group) report for the 3rd quarter of 2022, a new record was reported (1,270,883 phishing attacks), which is the worst quarter ever reported [53]. In U.K. government’s Cyber Security Breaches Survey published in 2022, 83% of businesses that reported some form of cyberattack in the preceding 12 months have also experienced a phishing attack as well [54]. According to the FBI, phishing emails are the most popular attack method used by hackers to deliver ransomware to individuals and organizations. Last, according to IBM’s Cost of a Data Breach Report in 2021, phishing is fourth most common and second most expensive cause of data breaches, costing businesses an average of USD 4.65 million per breach.

2.1.1 Phishing classification

Several categorizations are registered with respect to the techniques employed for the phishing attacks. Abdillah et al. (2022) [45] in their research divided the phishing techniques into three groups (general, spear, and whale phishing) based on the attack target. General phishing is carried out with phishers massively trying to scam without using maximum effort or personalized means, indicating that the chances of success are meagre. This type of attack is most successful against typically less attentive users. Spear phishing targets a specific person (or a group of people) via a premeditated medium, often including information known to be of interest to the target, with the aim to intercept sensitive information. Due to the more personalized means of execution, this method has typically been perceived as more effective (compared to general phishing) in luring its victims. Whale phishing (whaling) targets high-level decision-makers within an organization that have access to highly valuable information and hence when successful it yields immediate, more valuable results for the attackers. A second categorization in the same study is based on the different means employed, e.g., website, webpage, email, URL, SMS, and tweets. From a phishing attack survey that was conducted on the occurrences observed per type employed over the past 10 years, most of the occurrences were encountered in 2019 and 2020 by means of website (39%), webpage (22%), email (20%), URL (12%), and others (7%).

According to the researches from Alabdan (2020) [46] and Chiew et al. (2018) [55], phishing can be broken down into three main components. The first component is the medium, meaning the method (e.g., voice, SMS/MMS, and Internet) by which the phisher interacts with the target. The second component is the vector, that is the channel through which the phishing attack is conducted, with the main categories being vishing, smishing, email, instant messaging, social networks, and websites. Vishing is the method of phishing that uses the voice, either through a traditional phone device, mobile, or a VoIP. VoIP is a low-cost solution that can effectively obscure the actual physical location of the caller and can be almost indistinguishable compared to legitimate calls. Smishing is the use of SMS/MMS for the phishing attacks and can be implemented by sending a message to a victim (pretending to be originating from a trusted authority) or by sending a message that contains malware (or similarly that contains links to a website infested with malware). Instant messaging (IM), compared to the vectors mentioned above, enable attackers to leverage audio, video, emojis, photos, files, and hyperlinks in their phishing attacks, which in turn may yield higher effectiveness into captivating the victim’s attention and hence more effectively allure them to reveal personal information. Social networks allow people to communicate, connect, and share experiences and are hence an exceptional resource for phishers to identify group of targets and approach victims. Finally, fraudulent Websites are also often preferred by attackers, masked in such a way that renders them to appear as legitimate and which can then be used to intercept personal details when user-victim attempts to login or visit them.

The third component according to Alabdan (2020) [46] and Chiew et al. (2018) [55] is the technical approach that the phishers employ to gain access to the victim’s personal details, with the main categories being spear phishing, whaling, business email compromise (BEC), QRishing, social engineering, man-in-the-middle, and mobile phones. The technical approaches may function independently or as a combination of them. Business email compromise (BEC) is a sub-type of spear phishing that focuses on governmental services, commercial organizations, or other big entities, aiming to compromise the corporate emails of their employees and use these in the attackers’ favor. QRishing is a phishing attack relying on the fact that QR codes are challenging to be interpreted before being deciphered by a QR code reader, and on the fact that many QR readers are not seeking for the user’s approval before accessing the QR code’s content, thus leading the victim to malicious URLs engineered by the attacker. Social engineering is one of the oldest techniques available to phishers and is defined as the manipulation of a person by abusing the victim’s emotions, gullibility, charity, or trust. In the man-in-the-middle attack, a malicious user intercepts a direct communication between two parties, meaning an end user (victim) and a service provider. The attacker then reconfigures the data used by the victim and contacts the service provider pretending to be the legitimate user of the service, with the intent to steal their credentials, account information, financial data, and/or use the resources authorized by the service to the legitimate user. The most common form is call phishing, where phishers pretend to be a legitimate organization such as a bank or tax agency and instruct the user to share their personal and sensitive information.

On another research, Aleroud et. al. (2017) [47] propose a phishing taxonomy where an attack can span across four dimensions: communication media, target environments, attack techniques, and countermeasures. In communication media, seven types are identified from the literature, meaning E-mails, websites, instant messaging (IM), online social networks, blogs and forums, mobile, and voice over IP. Among them, emails and websites are the means the most frequently studied. The target environment relates to the physical device(s) which the victims use to interact with and can be classified as: personal computers (PC), smart devices, and typical voice devices (e.g., desk phones). Attack techniques are grouped into three categories based on their purpose, meaning attack initialization, data collection, and system penetration. For attack initialization, the most commonly employed techniques include the usage of spoofed URLs, bogus IVR, social networking, man in the middle attack (MITM), spear phishing, spoofing mobile browsers, and embedded web contents. Data collection techniques aim to gather sensitive data from the victim and mainly rely on creating fake web forms, key loggers, recorded messages, automated social engineering bots, and fake event invitations. Finally, system penetration techniques are used in order to exploit system resources that can later be leveraged to further facilitate subsequent phishing attacks and fall in two main categories: Fast-Flux, which is DNS related technique that protects phishing sites from taking down by hiding the hosting machine of phishing websites and cross-site scripting in which malicious scripts are injected into otherwise benign and trusted websites, usually when an attacker uses a web application to send malicious code.

In Fig. 2, we present a detailed analysis of the variations of phishing attacks, based on the three examined taxonomies, showing all the interlinks between the vectors (channels) through which the phishing attack is conducted, the mediums exploited during a phishing attack and the technical approaches that the phishers employ to gain access to the victim’s personal details. As shown in the figure, in the context of a phishing attack, a victim can be a target of a combination of technical approaches (from one or multiple vectors) that may be used by the phisher aiming for a better success rate. The knowledge of these interlinks is important to develop countermeasures that target each specific vector and to introduce policies and guidelines that prevent system or infrastructure exploitations from malicious activities.

Fig. 2
figure 2

Detailed analysis of the variations of phishing attacks showing all the interlinks between the vectors through which the phishing attack is conducted, the mediums exploited during a phishing attack and the technical approaches that the phishers employ to gain access to the victim’s personal details

3 Electroencephalography and eye-tracking apparatus

3.1 Electroencephalography apparatus

Albeit the first human EEG was recorded by Hans Berger in 1924 [56], electroencephalogram as a concept emerged in 1875 when Richard Caton reported in the British Medical Journal that animals with exposed cerebral hemispheres present electrical phenomena. EEG employs the principle of differential amplification or recording of voltage differences between distinct cerebral points operating a pair of electrodes that compares one active exploring electrode site with another neighboring or distant reference electrode. EEG belongs to the technology of brain-computer interfaces (BCI), which provides the brain with a non-muscular communication channel for conveying messages and commands to the external world. It is a non-invasive BCI method where a typical signal is used as an input for BCI applications and refers to the electrical activity recorded through electrodes positioned on the scalp, for measuring postsynaptic brain activity from the surface of the scalp associated with task-related or internal stimulation. This technique is used to measure different types of neural activities such as evoked responses (ERs), also known as evoked potentials (EPs) [57]. The EEG’s temporal resolution is higher than many other brain imaging methods because it is simple, non-invasive, portable, and cost-effective. Also, EEG method takes milliseconds to depict changes in contrary to other methods that may experience a delay on the order of seconds or minutes, and because of this, it is often used to evaluate the time course changes in brain activation across different brain regions.

Typical EEG arrangement includes a cap carrying contact electrodes and wires, which are used to connect the contact electrodes to amplifiers that improve the quality of acquired signals and convert the signals through an analog-to-digital transformation, that allows brain signals to be stored on a computer for further research [57]. The types of electrodes that are used to acquire the brain signals are wet electrodes that are attached to the scalp with conductive pastes and often special caps, or dry electrodes that do not require any conductive gel. The dry electrode technology achieves excellent standards comparing to wet electrodes and reduces the time to apply sensors and enhances user comfort [58]. The electrodes can be arranged on the scalp following one of the international 10–20, extended 10–20, international 10–10, and international 10–5 standards. In these standards, the locations on a head surface are described by relative distances between cranial landmarks.

The international 10–20 system (Fig. 3) was the first standardized system that was first presented at the 2nd International Congress of IFSECN in Paris in 1949 and published by Jasper in 1958 [59]. The system is based on the relationship between the location of an electrode and the underlying area of cerebral cortex. The numbers “10” and “20” refer to the fact that the distances between adjacent electrodes are either 10% or 20% of the front-back or right-left distance of the skull. The primary purpose of the 10/20 system is to provide a reproducible method for placing a relatively small number (typically 21) of EEG electrodes. In 1991, an extension to the original 10–20 system was accepted by the American Clinical Neurophysiology Society (ACNS) and by the International Federation of Clinical Neurophysiology (IFCN) which involved an increase of the number of electrodes from 21 up to 81. This extended the “10–20” system of electrode placement by what is known as the “10% system” and referred as “10–10” system. However, high-end users still needed even higher density electrode settings, and hence in 2001 an extension to the “10–10” system was proposed, namely, the “10–5” system, enabling the use of more than 320 electrode locations [60].

Fig. 3
figure 3

The international 10–20 EEG placement system. Left panel: the 10–20 system or international 10–20 system. Right panel: modified combinatorial nomenclature system (MCN) 10–10 system. Each electrode placement site uses 1, 3, 5, 7, and 9 for the left hemisphere and 2, 4, 6, 8, and 10 for the right hemisphere and has a letter to represent the specific lobe or area of the brain: frontal (F), temporal (T), parietal (P), occipital (O), and central (C). Suffixal (Z) sites referring electrodes placed on the midline sagittal plane of the skull (Fz, Cz, Pz, and Oz) are present mostly for reference/measurement points [54, p. 75–76]

The changes in EEG signals are highly associated with different cognitive functions, such as perception, emotion and cognition [61]. Table 1 lists the EEG wave properties analyzed by frequency band (measured in Hz), the corresponding brainiac region, and the states that relate to different human activities [61, 62].

Table 1 EEG wave properties analyzed by frequency band (measured in Hz), the corresponding brainiac region and the states that relate to different human activities

3.2 Eye-tracking apparatus

Eye-tracking is an experimental method of observing and recording the eye motion and the allocation of visual attention. An eye-tracker measures where, how, and in what order gaze is being directed during a specific task, rendering the eye-tracking apparatus a reliable tool for investigating problems related to the visual attention, behavior, needs, emotional states, desires, and cognitive processes of a person. Cognitive processes such as perception, memory, language, and decision-making are known to be influenced by gaze behavior [63]. Eyes reflect mental processing of whatever is looked at any given moment and this makes eye-tracking broadly applicable to most researches that explore mental processes. Because of its high temporal sensitivity, eye-tracking not only reveals indications of the outcome but also provides a moment-by-moment insight into the unfolding cognition [64].

In the past twenty (20) years, the use of eye-tracking in various fields of research has received increased interest by the research community. Improvements in the eye-tracking technology have made it more affordable and user-friendly for participants and researchers. Recent technological advancements in hardware and software have contributed to the development of eye-tracking applications. Cumbersome, slow, and expensive equipment have been replaced by inexpensive, unobtrusive, and wearable devices, which produce meaningful data for subsequent analysis [65].

Additionally, recent advances in computing capabilities enable the integration of machine learning algorithms (ML) into eye-tracking devices, rendering them into intelligent eye-tracking devices, and various hardware and software approaches have been implemented by research groups and companies [66]. Nowadays, the most popular eye-tracking system is the head-mounted video-based tracker that may be used in daily activities. Four eye-tracking techniques have been the focus of most studies in this field and in developing novel eye-tracking applications. These are the scleral search coil (SSC), infrared oculography (IOG), electrooculography (EOG), and video-oculography (VOG) [66]. Table 2 summarizes these techniques, how they work, advantages and disadvantages of each, and applications that they are typically used for.

Table 2 Eye-tracking techniques used in developing eye-tracking applications, how they work, advantages, disadvantages and applications typically used for

The most prevailing gaze-based metrics that are utilized in the literature [65, 67] are presented in Table 3.

Table 3 Potential eye-tracking metrics and indicators to measure cognitive load

4 Methodology and research questions

This survey performs a systematic analysis of existing works that embraced unimodal (EEG or eye-tracking) or multimodal (combination of EEG and eye-tracking) apparatus for phishing research, thus approaching the subject across the three pillars as shown in Fig. 4. The first pillar refers to the experimental design practices with an emphasis on the applied EEG and eye-tracking acquisition protocols. More analytically, it examines the EEG device and montage as per the electrode placement and the number of channels, the type of eye-tracker most often employed (portable, desk-mounted), and eye-tracking method used and examines the users background such as number of participants in the experiments, the users’ demographic data contrasted against the primary task of users, and the research question attempted to be answered from within the experiment.

Fig. 4
figure 4

The research model that was elaborated for a systematic analysis of existing works that embraced unimodal (EEG or eye-tracking) or multimodal (combination of EEG and eye-tracking) apparatus for phishing

The second pillar refers to the artificial intelligence and signal preprocessing techniques applied in those experiments. According to this pillar, the survey studies the analyzed channels of the examined EEG system, the existence of reference and ground electrodes, the participants’ eye-tracking metrics as a response to the exposed phishing attack type, which preprocessing methods were employed, which feature extraction and classification methods were most applied in the considered experiments, and the accuracy that was reported per experimental setup.

The third pillar refers to the phishing attack types. From this perspective, we try to identify the phishing attack types and the relation these can have on the activation of specific brain areas and examine the participants’ eye-tracking metrics as a response to the exposed phishing attack type.

4.1 Research questions

Based on the aforementioned research model, we formulated the following research questions related to the application of EEG and/or eye-tracking technologies against Phishing attacks:

RQ1: Which are the most applied experimental setups and how are these related with a) the number of participants, b) the EEG montage and/or eye-tracking setup, c) the EEG metrics and eye-tracking metrics employed, and d) the performance of these experiments. By answering this research question, we will be able to provide insights on the most effective means for coping with each of the most employed phishing schemes as well formulate a solid spring broad for new researchers entering the field to obtain a better overview of the current state-of-the-art on the subject.

RQ2: What are the investigated phishing attack types and how do they relate to cognitive processes and brain activity? By answering this question, we can identify what type of phishing is dominant in the interests of the research community and the white space in phishing types of research in terms of the cognitive and brain responses.

4.2 Research methodology

4.2.1 PRISMA setup process

Several well-known digital library databases were selected for the literature search, and the selection was based on their relevance to the computer science community. We performed a systematic search within the following digital libraries: Elsevier ScienceDirect, IEEE Xplore, ResearchGate, Springer, and the ACM Digital Library. To ensure compliance with research standards, PRISMA method [68] was employed. As the main research objective of this review is to examine articles that contain results of at least one EEG-based and eye-tracking-based experimental setup within a phishing context, the following keywords were selected: [phishing AND EEG], [phishing AND “eye-tracking” OR eye-tracking], [phishing AND BCI].

The initial process for our data collection began as a broad search for the term n1 = [phishing AND “eye-tracking” OR “eye-tracking”], n2 = [phishing AND EEG], and n3 = [phishing AND BCI] in Elsevier ScienceDirect, IEEE Xplore, ResearchGate, Springer, and the ACM Digital Library Databases. This generated 651 articles. From the initial search, we excluded duplicate records, articles that were not from journals or conference papers (e.g., books and presentations) and these that were from a different domain (e.g., health, social sciences, psychology, and education) coming up with a total of 327 papers.

4.2.2 PRISMA screening process

From this collection, we performed via manual supervision rather than the employment of any automated means, title, abstract, and full-text screening to identify papers that satisfied our inclusion and exclusion criteria. To be included, a paper needed to be primarily focused on the topic of phishing. The following inclusion criteria were used to identify and extract the useful literature from the search string: research articles should be in conference or journal, they should investigate phishing as well include references to EEG, eye-tracking, eye gaze, and BCI aspects and should have been published in the last ten years (2012–2022). Papers were excluded if the above inclusion criteria were not fulfilled and if they were an extended abstract or a work in progress, the primary language in which they were written was not English or they were found not to be related to phishing, even if they mentioned phishing somewhere in the paper. After applying the inclusion and exclusion criteria on the collected sample of 327 papers, 285 papers were excluded, and 42 papers were finally selected to be further processed. From a full-text eligibility on these papers, 29 papers were excluded for not containing any experiments; thus, 13 experimental papers remained for the course of the study. From the full-text study of the papers, 5 papers were added that were found in the references of the examined papers, so the number of papers that included in our SLR was 18 (Fig. 5).

Fig. 5
figure 5

PRISMA flow diagram related to selection process, where n1, n2, n3 denotes search criteria: n1 = [phishing AND “Eye-tracking” OR eye-tracking], n2 = [phishing AND EEG], and n3 = [phishing AND BCI]

Based on the PRISMA selection method outlined above, five (5) research papers have been retrieved that utilize experimental designs embracing an EEG apparatus in phishing research. Moreover, thirteen (13) research papers employ eye-tracking devices in their experimental setup. From the total of eighteen papers, two (2) rely on both methodologies, resulting to 16 papers studied (please see Table 4).

Table 4 The 16 research papers studied and relevant utilized apparatus (x stands for the approach examined in each study)

5 Analysis of results

5.1 RQ1: EEG and eye-tracking experimental design practices in phishing

5.1.1 Phishing and EEG

To obtain data with satisfactory quality, it is important to choose the right representative samples under the spectrum of demographic characteristics and other factors. All five (5) experiments examined in our survey followed the standard procedure of recording the demographic data of the participants. Most of the literature reviewed exhibited a higher proportion of male participants, outlining a gender imbalance. However, due to conflicting research findings regarding the influence of gender on phishing detection, it would be intriguing to validate the experimental results using a more representative sample that encompasses a broader range of genders. Similarly, the age varied between 18 and 34 years in all papers included in our analysis; nonetheless, an interesting conclusion was highlighted in [69], where the researchers observed differences in the participants belonging to each of the 19–22 and 30 + age groups, which may indicate that future studies might be needed to support these findings, especially considering the fact that as stated to our introduction, there are controversial results regarding the age differences. Regarding the participants’ background, they were mainly university students, which albeit justifies the lower age groups examined, at the same time opens the question for further experimentation against other age groups, aiming to assess the generalizability of the study and to address questions on the effectiveness of these methods to a wider (age wisely) population.

5.1.2 EEG-montage and preprocessing

With respect to the EEG montage, it is important to analyze the electrode placement and the number of channels, because they can provide details about which brain areas are activated during a task and what is the relationship between the brain activities and the specific task. As shown in Table 5, most experiments use the “10–20” international system’s [59] brain electrode distribution, whereas the number of electrodes used ranges from 2 to 256 and the sampling rate varies from 256 to 1000 Hz. Neupane et al. (2015) [69] uses EEG headset that is utilizing 10 channels of data, meaning Fz, F3, F4, C3, Cz, C4, P3, POz, and P4 sites to collect EEG data with a 256-Hz sampling frequency, Rahman et al. (2019) [70] utilizes 14 channels of data, meaning AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4 with two sensors as reference, with mostly the frontal lobe and parietal lobe sensors (AF3, F3, FC5, F7, P7, and P8) highly activated for the phishing detection task, Valecha et al. (2020) [71] uses an EEG headset with a 64-electrode cap, with a 500-Hz sampling frequency and Sun and Yeh (2017) [72] utilize 2 electrodes with sampling 512-Hz frequency. Finally, Hashem et al. (2017) [73] use EEG headset that follows the 10–20 (Geodesic) EEG system with 256 electrodes and a 1000-Hz sampling frequency. Regarding the reference electrodes, although several studies [84] indicate that the selection of reference electrode(s) can affect the estimation of certain EEG measures (e.g., connectivity), reference electrodes have not been recorded in most of the experiments performed, with the exception of the study of Rahman et al. (2019) [70].

Table 5 Design of experiments and accuracy achieved in phishing and EEG experiments, asterisk denotes classification method that yield the highest accuracy from within the methods evaluated in each study. Columns: I. EEG-placement, II. no of exploited channels, III. Sampling (Hz) IV. Preprocessing, V. Feature extraction, VI. Classification, and VII. Accuracy

EEG recordings tend to contain noise and artifacts which may often affect the experimental analysis and results. Therefore, it is essential to apply preprocessing and denoising to eliminate such artifacts as well as any additional noise from the electromyography (EMG) and electrooculogram (EOG) samples. Aiming to address RQ1, in the context of the present survey, we investigated which preprocessing methods were employed in the literature examined in our survey. As shown in Table 5, MATLAB toolboxes, low-pass or high-pass filters and a variety of similar proprietary software tools (e.g., B-Alert Lab (BAL) Software provided by ABM, ThinkGear technology) tailored to the preprocessing task were used for this purpose.

5.2 Feature extraction, classification, and accuracy

Similarly to preprocessing, in emotion recognition analyses that rely on EEG signals, feature extraction is often considered beneficial towards improved emotion classification performance [85]. For our research and aiming to address RQ1, we investigated which feature extraction and classification methods are most applied in the considered experiments as well as noted the accuracy that was reported per experimental setup. As shown in Table 5, WPD with MATLAB [73], PARAFAC 2 [70], and ICA [71] are used for the feature extraction task and several methods (including Bayesian regression, lazy/ensemble learning, and others) are used for classification. Hashem et al. (2017) [73] examines four different classification algorithms (support vector machine (SVM), k-nearest-neighbors (k-NN), random forests, and bagging predictors), with SVM succeeding the highest accuracy (99.77%). Rahman et al. (2019) [70] compared five classifiers, (BayesNet, logistic regression, JRip, IB1, and random forest) with the best performance being yielded by the random forest classifier (97% in terms of classification accuracy). Finally, Neupane et al. (2015) [69] use quadratic discriminant function classification algorithm (QDA) for their analysis, reporting a classification accuracy of 69.69%.

5.2.1 Phishing and eye-tracking

Based on the PRISMA selection method outlined at Section 4.2, thirteen (13) experiments have been identified relevant to eye-tracking apparatus, containing evaluation for the reviewed phishing attack types.

Participants demographics

The number of participants in these studies ranged between 20 and 30, where the age range was between 18 and 34 years, reporting a mean age of ~ 20 years old, which is representative of the group of users who use Internet frequently and who are supposedly more vulnerable to phishing attacks. Alsharnouby et al. (2015) [75] find no statistical significance between participants’ ages and their scores whereas Neupane et al. (2015) [69] identifies differences between the participants belonging to the 19–22 age group and those aged more than 30, which indicates some white space for further experimentation as per the effect the age could have on such phishing attacks. Our analysis of the reviewed papers revealed an imbalance in the representation of genders, with a predominant focus on male participants in eight experiments, while in one experiment [80] 90% of their participants were female, and in four experiments [17, 73, 81, 82], the gender was not recorded. Finally, Huang et al. (2022) [81] in his experiment is diversifying the participants (concerning their race, gender, and age) and adopts the feedback loop of Bayesian optimization to make a more comprehensive study of the human behaviors that cover different user groups.

Regarding the background, the participants were university students [73, 74, 76, 79, 81,82,83], or non-students working in the academic environment (working professionals, technical staff, and scientific staff) [69, 77, 78]. Similarly to before, examining the generalizability of the study to a larger population would be an interesting future direction.

Eye-tracking method and metrics

Video oculography (VOG) is used as eye-tracking method for all experiments, as it is an invasive method that has been proven to yield better results in terms of accuracy, can capture eye movement disorders, is relatively easy to use, allows for head movements for the participants, and can be fully remote recorded. The type of eye-tracker most often employed is consisted of a remote desk mounted system with multiple cameras; however, the use of a head mounted eye-tracker was also employed in two experiments. Finally, the area of interest (AOI) is used in most of the papers and fixation metric is prevailing in most of the experiments among the several metrics measured, as shown in Table 6.

Table 6 Phishing attack type and eye-tracking metrics in phishing and eye-tracking experiments. The fixation metric is the most common measured among the metrics that are identified in the experiments examined (x stands for the times a metric is found in examined experiments)

5.3 RQ2: what are the investigated phishing attack types and how do they relate to cognitive processes and brain activity?

5.3.1 Phishing and EEG

Rahman et al. (2019) [70] and Neupane et al. (2015) [69] examined website as the category of phishing attacks. In Rahman et al. (2019) [70], the research question was around which brain areas are highly activated during a phishing website detection task and what is the relationship between the brain activities and phishing detection task. In Neupane et al. (2015) [69], the authors address the question on how users behave as they process, interpret, and operationalize security information when making security decisions. In both experiments, the primary task of participants was to select whether a website shown to them is legitimate or fake. The collected brain data from the human scalp in both experiments showed that mostly the right frontal lobe and parietal lobe areas, typically involved in decision making, reasoning, and attention, are highly activated during phishing detection. Valecha et al. (2020) [71] use e-mail as phishing attack type. In that experiment, the primary task of subjects is to respond to a mix of phishing and benign emails or were asked to decide if the emails are genuine. The research question aims to assess the role of cognitive responses and correlated brain responses within the phishing context. The collected data showed that both the right inferior frontal and central parietal areas are responsible for adaptive decision-making and performance monitoring in phishing attacks. Finally, Hashem et al. (2017) [73] and Sun and Yeh (2017) [72] do not focus on a specific phishing attack type but approach the subject from a more general malicious activities detection angle. In their experiments, the primary task of the participants was to identify benign and malicious activity tasks, and their research attempted to answer on how user’s brain processes malicious and benign activities using electroencephalogram (EEG) signals.

To summarize regarding the phishing attack types within EEG experimentation, we note that Web-based phishing attacks are present in two experiments, email in one experiment and in two experiments there was no specific phishing attack type examined but a spectrum of malicious activities is reviewed instead. Based on the factor analysis, the frontal lobe and parietal lobe areas, more dominant in decision making, reasoning, and attention, are highly activated during the phishing detection task.

5.3.2 Phishing and eye-tracking

Extending the investigation of RQ2, in our analysis of published experiments on Eye-tracking and phishing, we also examined the participants’ eye-tracking metrics as a response to the exposed phishing attack type.

Ramkumar et al. (2020) [74], Neupane et al. (2015) [69], Alsharnouby et al. (2015) [75], Darwish and E. Bataineh (2012) [77], Miyamoto et al. (2015) [76], Miyamoto et al. (2014) [79], and Xiong et al. (2017) [83] focused more on analyzing phishing attacks utilizing websites as means. In all experiments, the primary task of the participants was to determine whether a website was legitimate (real, safe) or fraudulent (fake, unsafe). Similarly, in Miyamoto et al. (2015) [76] and Xiong et al. (2017) [83], the participants were presented with the screenshots of a browser that rendered websites or screenshots of webpages respectively. Each one of the experiments attempted to answer a different research question: Ramkumar et al. (2020) [74] set the question of how users behave (in terms of cognitive processes involved) as they process, interpret, and operationalize security information when making a security decision, Neupane et al. (2015) [69] addressed the research question of how users process the task of detecting phishing attacks utilizing eye gaze patterns captured by an eye-tracker, Alsharnouby et al. (2015) [75] investigate which strategies the users employ to determine the legitimacy of websites, Darwish and Bataineh (2012) [77] set the research question of what the natural user viewing behavior is, when exposed to a phishing attack and Miyamoto et al. (2015) [76] evaluate the correlation between eye movements and phishing identification. Last, Xiong et al. (2017) [83] sets the question of how users allocate attention during Web page browsing.

Pfeffel et al. (2019) [78], McAlaney and P. J. Hills (2020) [80], Huang et al. (2022) [81], Yang et al. (2017) [17], and Anderson et al. (2013) [82] focused on analyzing phishing attacks involving the use of e-mails. In Pfeffel et al. (2019) [78], the participants were called to identify phishing mails versus legitimate ones and the researchers investigate on what basis did the users decide whether they are confronted with a phishing mail or a legitimate one. In McAlaney and P. J. Hills (2020) [80], the participants were shown emails that either did or did not include a phishing indicator and the research question was to identify the common elements of phishing emails that influence the victims’ processing and judgment as per the email creditability. In Anderson et al. (2013) [82], the participants must distinguish among previously seen emails, novel emails, and manipulated phishing emails and the research question is how the eye movement-based memory effect influence users’ susceptibility to phishing. Finally, Huang et al. (2022) [81] and Yang et al. (2017) [17] use eye-tracking method and e-mail attack as a type to verify the effectiveness of the method that is proposed in their paper.

Hashem et al. (2017) [73] did not focus on a specific phishing attack type but span their analysis across several types of malicious activities. In this experiment participants conducted usual activities as well malicious activities and the researchers investigated how the user’s brain processed the malicious versus the benign activities, by means of eye-tracking while at the same time captured the spontaneous responses that may come unfiltered by the conscious mind. Towards that direction, three metrics were collected by the eye-tracker which were saccade, fixation locations, and pupil diameter.

In summary, Website phishing attack type is present in seven out of thirteen experiments, e-mail phishing appears in five experiments and study of miscellaneous attacks is the subject of only one experiment. In terms of the eye-tracking metrics, several variations were considered as shown in Table 6, with the fixation metric to be the most common one.

6 Discussion and research suggestions

The present literature survey aims to provide a systematic overview of existing experimental phishing research that leverages EEG and/or eye-tracking apparatus. Towards this direction, we examined the interlinks governing a phishing attack across the vectors through which the attack is conducted, the mediums most frequently exploited, and the technical approaches that the attackers employ to gain access to the victims’ personal details. Our survey was focused on articles that contain at least one EEG-based and/or eye-tracking-based experiment within the context of a phishing attack and we analyzed a variety of montages, protocols, experimental setups as well as methods typically employed for signal preprocessing, feature engineering, and classification.

An interesting conclusion deriving from the examined literature is that the users’ personality traits (e.g., attention control) may directly impact on their phishing susceptibility and suggests that users may be further trained to detect phishing attacks more effectively if they sharpen their attention control skill. In contrast, user demographics do not provide similarly conclusive indications, which in turn opens the road for further exploration on whether fully personalized model, with task/service specificity and/or with the inclusion of advanced AI-driven techniques (e.g., large language models) could be beneficial towards the development of more robust anti-phishing detection systems.

Similarly, our survey analysis indicated that focusing exclusively on either brain or gaze-driven signals analysis can yield satisfactory performant models; nonetheless, less research has been yet done in the direction of multimodal anti-phishing frameworks that combine both sources and/or when the victim is exposed to a spectrum of simultaneous phishing attacks (e.g., concurrently website, email, smishing, and vishing). In such complex scenarios, a more comprehensive investigation of the EEG and eye movement responses may reveal important insights on the ways the cortical and brain activity combined with other physiological variables interplay and relate as a response to such orchestrated attacks.

An interesting future perspective would be the investigation and validation of the findings related to these two attack types, against other types of phishing (e.g., smishing, vishing, and social media). This would also provide insights on which brainiac areas are more dominant per phishing attack type and on whether tailored cognitive models per phishing type may be needed. Similarly, another direction for further exploration relates to understanding the effect that interventional training has on the users’ performance when it comes to phishing detection.

On a similar direction, the investigation of cognitive and brain responses triggered when the victim is exposed to a combination (e.g., website and e-mail) of phishing attacks and may reveal insights on how each of them affects the receiver’s brain activity and respective brain lobe regions that are stimulated by such an orchestrated attack. Last, an interesting perspective for further investigation would include the concurrent employment of eye movement measurements (eye-tracking) with brain activity measures (EEG), to assess how the eye movements, cortical activity, brain activity, and other physiological variables interplay and relate as a response to the behaviors of victims of phishing (stand-alone or combination thereof) attacks.

Additionally, and as an interesting future perspective combining the best practices of the analyzed published research, we suggest a human-centric and AI-based phishing modeling approach which may provide a more comprehensive framework for identifying vulnerable users for phishing attacks (Fig. 6).

Fig. 6
figure 6

Multimodal EEG and gaze-based anti-phishing framework

6.1 Suggestion I: multimodal anti-phishing frameworks

Current state-of-the-art anti-phishing frameworks are approaching the subject from a rather unimodal perspective and focus mainly on either gaze-based [17, 74,75,76,77,78,79,80,81,82,83] or EEG-based signal processing [70,71,72], to reason about the users emotional or cognitive state when experiencing a phishing attack. Despite the accuracy of such approaches, a multimodal framework that combines diverse and complementary data sources (e.g., as in [69, 73]) could be more successful to capture variations in the users’ responses that are more challenging to surface when examining them in a vacuum. Such an approach can enable the development of personalized multimodal approaches, tailored per user, and usage scenario (e.g., interaction with a web-banking service), that would be (re)trained and improved every time a user interacts with the service.

6.2 Suggestion II: standardization of pre-processing and feature extraction techniques

In the context of the present survey, we investigated which preprocessing methods were employed and which feature extraction methods are most applied per experimental setup. As presented in RQ1, MATLAB toolboxes [70, 71], low-pass or high-pass filters [73], and a variety of similar proprietary software tools [69, 72] were used for this purpose. Taking into consideration these approaches and the reported results, it would be interesting to explore the effectiveness of these methods in the context of a multimodal anti-phishing framework and their robustness across the varying types of phishing attacks.

6.3 Suggestion III: continuous training and extension to deep learning/large language model-based approaches

Aiming to address RQ1, we investigated the performance of several classification methods that are most applied in phishing experiments, with random forests and SVMs [70, 73] often scoring higher in terms of classification accuracy. Further exploration in this space, with the examination of more complex deep learning architectures and/or the inclusion of large language models (LLMs) within the training and inference processes would be another interesting direction to pursue. This could also extend to investigate whether the training on an individual basis (i.e., one model per user) can better address the conflicting conclusions from the existing research with respect to the influence the users’ demographics have on their phishing susceptibility [9, 14, 20, 21] and to what degree a continuous re-training on new user input can result in more performant phishing detection systems.

6.4 Suggestion IV: privacy preservation

The combination of user-specific brainiac and gaze-based features can raise privacy concerns and for this reason biometric template protection (BTP) schemes should be considered. In this process, attention needs to be paid to enhance irreversibility (i.e., irreversible transformation over the biometric data needs to take place before these data are stored), unlikability (i.e., the stored biometric references should not be linkable across different applications or databases), and renewability (i.e., being able to issue a new template, totally different to previous ones, in case the old template is lost or compromised), while at the same time preservation in terms of verification accuracy, speed and storage requirements should be considered [86]. Moreover, self-sovereign identity (SSI) management architectures need to be further investigated aiming to provide a viable solution to the end-users for keeping control on diverse access levels to their anti-phishing models. Such a secure—in terms of privacy preservation—framework can also to facilitate the transfer of scientific knowledge to other domains and enable model sharing across diverse organizations such as governments or health and finance institutions.

In Fig. 7, we describe a brief scenario that demonstrates the usefulness and anticipated value of such an approach:

Fig. 7
figure 7

User scenario that demonstrates the envisioned anti-phishing framework

7 Conclusion

The purpose of this survey is to analyze articles that focus on conducting experiments using EEG-based (electroencephalography) and eye-tracking apparatus within a phishing context. The survey findings indicate that the most commonly studied phishing attack types were website and email phishing. These experiments typically involved university students or academic personnel as participants and were conducted in controlled laboratory environments, which may have limited ecological validity.

The controversy observed in terms of the influence demographic factors can have in phishing susceptibility, the narrower participants’ demographics, the employment of controlled experimental conditions, and the typical isolated examination of either the brain or the gaze-based signals indicate towards interesting future research directions from improved phishing detection and prevention.

More specifically, we recommend conducting additional research to explore the applicability and generalizability of these findings to other commonly encountered phishing types, such as voice or SMS phishing. Furthermore, there is a need to assess the resilience of individuals facing multiple orchestrated attacks and to simultaneously analyze eye movement and brain activity measurements. Incorporating advanced AI methods, such as complex deep learning neural networks and/or large language models, could enhance the analysis. Additionally, expanding experimental evaluations to simulate real-life unsecure operating conditions would be beneficial, as it may contribute to the development of more robust anti-phishing mechanisms.

The authors aspire that the present analysis will inspire more researchers working on the field to expand the current state-of-the-art across the three pillars outlined in Section 4 and based on the high-level framework proposed at Section 6.

7.1 Limitations

In the present survey we identified some limitations. A first limitation is the short number of papers found on phishing and EEG or eye-tracking apparatus. Although the original search for [Phishing] returned a fairly large number of papers dated since 2003, when the search was narrowed down to [Phishing AND EEG] and [Phishing AND Eye-tracking] only forty-two (42) papers were returned, largely skewed in terms of publication date towards the last decade (from 2012 onwards). In these 42 papers, five (5) experiments were identified related [Phishing AND EEG] and thirteen (13) experiments in the field of [Phishing AND Eye-tracking]. Our search indicated that three (3) papers approach the subject from the adjacent (to the EEG) areas of fNIRS (functional Near Infrared Spectroscopy) [87] and fMRI (functional Magnetic Resonance Imaging) [20, 88].

Another limitation is that in almost all the examined experiments, a “secure” university lab environment was used, with students constituting the most representative sample in terms of participants in the studies. This might may have impacted on the ecological validity of the experiments, since the participants are representative of more narrow demographics and may not have sensed authentic security threats due to the controlled environment. This opens the way for further exploration of the generalizability of these findings to a wider population, characterized by varying demographics and/or who may be experiencing an attack under uncontrolled (real-world) operating conditions.

Finally, the last point also extends to cover for another limitation related to the operation of the involved biometrics systems (EEG and eye-tracking devices). In all examined experiments, the subjects participating were measured during a single visit in a controlled (university) environment. However, to capture signals indicative of real-life phishing attacks, these biometric systems should often be employed multiple times per day, potentially every day and/or over a lengthier period. This imposes challenges as per the easiness to move the necessary equipment outside of laboratory environments to simulate real-world scenarios.