Introduction

Today, almost all students experience anxiety and fear before or during exams. Test anxiety is defined as the cluster of phenomenological, physiological, and behavioral responses that accompany anxiety that may emerge due to the possible negative consequences of failure in a test or similar assessment (Zeidner, 1998). Test anxiety is considered as a special fear that is not apparent in a daily life but becomes evident before or during an examination (Sarı et al., 2018). Actually, anxiety protects individuals when it is at a normal level; however, it may cause a decrease or deterioration in performance by negatively affecting one’s daily life when it is at a high level. According to their review, Putwain and Daly (2014) concluded that 15% to 22% of student’s experience high level of test anxiety. While Cassady and Johnson (2002) stated that students with test anxiety demonstrate poor academic performance, Hamilton et al. (2021) puts strong emphasis on university students’ critical problems caused by test anxiety.

There exist many factors triggering test anxiety including the duration of an exam, the number of questions, the test technique used, test instructions, and environment in which the test is administered (Wadi et al., 2022). Ohata (2005) also discusses the exam duration and its pressure on students. Wong (2008) reported that cognitive structures such as irrational beliefs, negative autonomous thoughts, and dysfunctional attitudes initiate test anxiety. Another factor that causes students to experience test anxiety is their parents’ attitudes and behaviors (Ringeisen & Raufelder, 2015). In another study, Wong (2008) found that students’ irrational beliefs, dysfunctional attitudes and automatic thoughts were significant predictors of test anxiety. In addition, some studies reported that anxiety was affected by personal characteristics and that test anxiety was a non-threatening and temporary situation that students experience (Hodapp et al., 2011). Segool et al. (2013) stated that test anxiety may occur especially during high-risk exams and may be an obstacle to academic performance. Moreover, some studies examined the gender differences in terms of test anxiety and reported that female students experience higher test anxiety than male students (Zaheri et al., 2012).

Test anxiety creates deep thoughts and frustration accompanied with physical pain and enthusiasm, which cause students to experience a sense of panic and failure. This situation leads them to consider exams as potential disasters (Maxfield & Melnyk, 2000). Some studies even reported other types of anxiety disorders due to test anxiety in students (Spielberger, 2010). Overall, test anxiety has association with low academic performance (Segool et al., 2013), lack of motivation (Keller & Szakál, 2021), low self-esteem (Thomas et al., 2022), learning difficulties (Chapell et al., 2005), dropout and high level of depression (Leadbeater et al., 2012). In addition, people with test anxiety may show various psychological and physiological symptoms including amnesia, hypertrophy, increased breathing and hearth beating, nausea, diarrhea, sweating, and difficulty in focusing (Huberty & Dick, 2006). Beers (2003) also stated that students with test anxiety may have difficulty in reading questions and finding correct answers, organizing and expressing their thoughts, and finding appropriate words. Test anxiety is a debilitating variable for academic performance from primary school to higher education (Kader, 2016). Manchado Porras and Hervías Ortega (2021) conducted a study and reported a negative correlation between test anxiety and academic performance. A similar result was found by Brady et al. (2018). If one experiences a high level anxiety, his reasoning and abstract thinking skills are disrupted, which, in turn, negatively affects his academic performance. In addition to these, test anxiety may affect one’s cognitive structure by creating failure and disappointment, which are observed through embarrassment, hypersensitivity, and memory problems (Huberty & Dick, 2006).

There are various methods to measure test anxiety level. Test anxiety was first considered as a one-dimensional structure (Sarason et al., 1960). Then, it was determined that it had a two-dimensional (Zeidner & Matthews, 2005) and multidimensional structure in the following years (Putwain & Daniels, 2010). According to Mowbray et al. (2015), two dimensions of test anxiety were worry and emotionality. While the worry dimension includes internal conversations such as thinking about the consequences of failure and doubting about the ability to succeed, the emotionality dimension consists of physiological reactions related to anxiety during an examination. In another study, Zeidner and Matthews (2005) considered test anxiety in two dimensions: emotional-physiological and cognitive. While the cognitive dimension refers to negative thoughts and concerns during the exam period, the emotional- physiological factor refers to the unintentional experiences that occur before the exam period such as not taking enough time to study for the exam and procrastination behaviors. In addition to cognitive and emotional factors, some scales include factors related to thoughts, off-task behaviors, autonomic reactions, and so on. The test anxiety measurement tools in the literature were generally designed for exams that are administered in face-to-face environments. With the emergence of COVID-19 pandemic, there was a direct shift to online education. Due to online exams, students’ test anxiety levels may change (Saadé & Kira, 2009). For instance, Block et al. (2008) reported that students had a high level anxiety in their first online exams due to lack of experience and knowledge about them. A similar conclusion was reported by Wang et al. (2001).

Most parents are concerned about missing out on learning as a result of the hasty transition to remote study, whereas students have seen benefits. With the decrease in affected cases, many students and parents expected schools to resume face-to-face instruction. However, given the importance of digital literacy in preparing students for future challenges, it must be maintained (Jamilah & Fahyuni, 2022). Blended learning can be an alternative to post-COVID-19 learning because it preserves online learning developed during the pandemic while also requiring physical presence and social interaction, which are key features of face-to-face learning (Andrew et al., 2021; Peimani & Kamalipour, 2021). Systematic reviews demonstrated that online learning can be modified and combined with offline learning to create a blended learning method that schools can adopt and use post-COVID-19 (Jamilah & Fahyuni, 2022). Because digital skills are twenty-first century abilities that students must master, online learning developed during the COVID pandemic must continue to be developed. According to research, online learning can continue in the post-COVID-19 era (Lockee, 2021). Online learning can be modified and combined with offline learning in the post-COVID-19 era to form a blended learning method. This method can compensate for the shortcomings of both online and face-to-face learning. There is likely to be an increased demand for pedagogically sound and adaptable learning environments, as well as innovations in learning technology and design, in the post-COVID-19 era (Peimani & Kamalipour, 2021). Therefore, it is likely that online exams will be widely used, even as the impact of COVID-19 is waning. The previous studies examined the cognitive, affective, physiological, and social dimensions of test anxiety while focusing on traditional test structures. In addition, due to the widespread use of online learning environments, measurement tools are needed to examine students' online test anxiety. Therefore, this study aims to develop a scale that measures students’ test anxiety levels in online exams.

Method

This study was designed by using mixed research model by combining qualitative and quantitative research methods. The mixed research method enables researchers to ensure the reliability and validity of the research and to eliminate the deficiencies of both qualitative and quantitative methods by combining them (Gültekin et al., 2020).

Participants

The participants consisted of 859 university students from three different universities in Turkey (male n = 519, female n = 340). Participants reported that they took the online exam at least once. Each stage of the research was consisted with a different sample. Information about the participants was given in detail at each stage. The research procedure was approved by Ethics Review Board of Fırat University.

Data analysis

In the structuring phase of the research, the data obtained from the participants were analyzed using descriptive analysis. As a first step, the data collected via the interview protocols, and reports were saved digitally on a computer without any alteration or correction. It was indexed as P1, P2, … to P30 for the anonymity of the participants' names and the confidentiality of their personal information. Then the answers to the open-ended questions were coded to identify topics, issues, similarities, and differences revealed through the participants’ narratives (Braun & Clarke, 2006). After discussing emerged meanings to agree on overemphasized or underemphasized themes (Shenton, 2004), another colleague is also consulted for his inner vision into the emerging codes to decrease or avoid any potential bias. The researcher applied an independent coding process for the data gathered from interviews. According to Patton (1999), the expertise of a different expert working in the field of qualitative data analysis than the researchers was utilized. With the analysts whose expertise was used during the data analysis process, authors who completed their Ph.D. in the field of Assessment and Evaluation in Education. Moreover, for inter-rater agreement, the formula of “[the number of agreement / (the number of agreement + the number of disagreement)] × 100” (Miles & Huberman, 1994) was used. The interrater agreement in the initial case was 88.3%. The coding process is considered in two titles: Technical Anxiety, Physiological and Psychological Anxiety. The analysis process included cyclical and continuous comparisons in the form of code development/refinement and theme development/refinement (Glaser & Strauss, 1967). Finally, as a result of discussions on analyst comments, final themes were introduced (Richards & Morse, 2013). The MS Microsoft Excel program was used for data analysis, as suggested by Bree and Gallagher (2016) and Meyer and Avery (2009). Because coding is an important step in analyzing qualitative data (Ose, 2016), this study's qualitative data analysis was done in MS Excel because it allows systematic coding in a simple and user-friendly manner.

Analyzes in the quantitative evaluation, reliability and validation phase of this study were performed using SPSS 22.0 and AMOS 20.0. Explanatory factor analysis (EFA) and confirmatory factor analysis (CFA) were performed to examine the internal structure of the survey. In addition, convergent and discriminant validity were performed. There were no missing data. For EFA and CFA, there exist assumptions to be met: the linear relationship, normality and no multicollinearity problem. In this context, the data was examined for normality and linearity. The skewness and kurtosis values must be between -2 and + 2 for the data to be considered normally distributed, according to Tabachnick and Fidell (2007). The normality assumption was met based on preliminary data analysis. There were no univariate or multivariate outliers found using box plots and Mahalanobis distance. Furthermore, the correlation levels of the variables were examined using the variance inflation factor to determine whether multicollinearity exists in the data set (VIF). An acceptable VIF value needs to be < 5.0 (Hair et al., 2010). There was no multiple correlation discovered between variables. Combining EFA and CFA is recommended because EFA, especially at the beginning of a scale development, can account for “unanticipated, but substantively meaningful, factors influencing subsets of items or unanticipated cross-loadings” (Flora & Flake, 2017); in turn, CFA allows to strengthen EFA results by replicating them on a separate sample. To determine the number of factors to be extracted, two different statistical methods were used: Kaiser’s rule (i.e., number of eigenvalues greater than 1, Kaiser, 1960) and scree test (Cattell, 1966). The factors were then extracted using Principal Component Analysis (PCA) with promax oblique rotation, with the hypothesis that there is some degree of correlation between the factors because they are all related to the study. Importantly, the aim of EFA analysis is to determine whether the factor extracted are consistent with the theoretical aspects they should reflect: in our case, factors were expected to technical anxiety, physiological and psychological anxiety. Cronbach's alpha was used to determine the internal consistency of factors, and item-total correlations were also computed. Comparative fit index (CFI) > 0.95, non-normed fit index (NNFI) > 0.95, root mean square error of approximation (RMSEA) < 0.08 and standardized root mean square residual (SRMR) < 0.08 (Hu & Bentler, 1999). The analysis involved in assessing the internal consistency reliability (Cronbach’s alpha and composite reliability), convergent validity Average Variance Extracted (AVE) and factor loading), and discriminant validity (Fornell-Larcker criterion, heterotrait-monotrait, HTMT ratio of correlations criterion) (Fornell & Larcker, 1981; Hair et al., 2017a, b).

Development of the Scale

The development process consisted of five phases: (1) including planning, (2) structuring, (3) quantitative evaluation, (4) validation, and (5) convergent and divergent validity. Each phase was explained in detail below.

Phase 1: Planning

The first phase includes the planning process. In this phase, in order to determine the similarities and differences between traditional and online test anxiety, the theoretical and experimental studies related to test anxiety in the literature were examined. It was observed that the psychological dimension was considered as the most important and common dimension in terms of test performance. In addition to that, early studies mentioned emotional, psychological and somatic, and behavior-related dimensions (see Dusek, 1980; Sieber, 1980). While developing the item pool for the scale, the section on anxiety disorders in a book entitled Diagnostic and Statistical Manual of Mental Disorders (DSM-5) was reviewed to identify the symptoms of anxiety. According to the DSM-5, anxiety and concern are associated with three or more of the following six symptoms: Restlessness or feeling keyed up or on edge, being easily fatigued, difficulty concentrating or mind going blank, irritability, muscle tension, and sleep disturbance. This reveals that online test anxiety scale needs to include items related to psychological and physiological anxiety symptoms (American Psychiatric Association, 2021).

In their study, Block et al. (2008) and Wang et al. (2001) put strong emphasis on students’ inexperience in online education, their limited knowledge and skills about online exams, and their negative effects on students’ test anxiety levels. In addition, Oyedele and Simpson (2007) pointed out technical anxiety by stating that people may experience a high level of anxiety or stress when they use new tools and equipment that they are not familiar with. Also, Baker et al., (2015) argued that anxiety negatively affects individuals’ logical judgments, decisions, and behaviors in many situations. According to these statements, in addition to the traditional test anxiety, there is a possibility for individuals to experience technical anxiety in online environments. Therefore, it was decided that items related to psychological, physiological, and technical anxiety to be included in the draft version of the test anxiety scale of online exams.

Phase 2: Structuring

In the second phase of the study, interviews were conducted with 30 students (male: n = 16, female: n = 14, age: M = 22.1 years, SD = 2.33) who already took online test in order to identify the factors influencing test anxiety. The interviews took approximately 20 min. The participants were in their either second or third year in undergraduate education. They were chosen specifically because their instructor had the impression that they had a high level of test anxiety. Due to the pandemic epidemic, the interviews were conducted via online. After getting their permissions, the interviews were recorded. At the beginning of the interviews, each participant was given the definition of anxiety and asked whether they felt anxious before or during online exams. They all agreed that they had test anxiety. The participants then were asked two questions. While the first question was about their physiological and psychological symptoms, the second question was about technical anxiety symptoms. In this study, interviews were used to develop items on online test anxiety. To ensure content adequacy, three experts' ratings were used to confirm that the developed items addressed all two levels of online test anxiety triggers (physiological and psychological symptoms). Cognitive interviews were conducted with four additional people who were not involved in the interviews or expert ratings to test the comprehensibility of the items (age: M = 33.50 years; gender: male: n = 2, female: n = 2) and asked them to think out loud while reading the items and answering them. In order to analyze the data, a content analysis was conducted. Figure 1 is designed to represent the findings related to the physiological and psychological anxiety symptoms.

Fig. 1
figure 1

Findings related to the physiological and psychological anxiety symptoms

As seen in Fig. 1, the findings of the answers received from the participants showed that consisted of eight subcategories in the online exams. Participants stated that they experienced the most physiological and psychological anxiety symptoms such as stress, feeling panic and sweating in online exams. Some example quotes are provided below:

K5 reflected the symptoms of physical anxiety during the online exam and put it as follows: "As soon as the online exam started, I feel like a bottle of hot water was dumped on me. I feel that my breathing is accelerating and I cannot control it. I get anxious enough to forget what I already know." Similarly, participant P7 stated that he experienced physical anxiety during the online exam and expressed it as follows: "Just minutes before the online exam starts, my hearth start beating so fast like it will get out of my body in seconds. During the exam, my palms get sweaty. This prevents me from focusing on my exam." In the APA Dictionary of Psychology, conditions such as sweating, tremors, dizziness, and rapid heartbeat are seen as physical symptoms of anxiety. Therefore, there is significant evidence that participants experience physical anxiety during the online exam. In addition, there were important opinions that some of the participants experienced psychological anxiety during the online exam. P11 participant stated the symptoms of psychological anxiety during the online exam as follows: "I do not know what I am doing in the exam due to the stress I feel. Even I read the questions several times, I do not understand. I do not remember whether I am that stressful in face-to-face exams. I experience an incredible panic. I really hate online exams." Moreover, participant P17 explained the psychological anxiety in online exams with the following statements: "Exams always make me nervous. Especially in online exams I panic because of the thoughts that I will not be able to keep up the time or the internet will be cutoff. I think online exams are more stressful than face-to-face exams." It can be said that the participants had difficulty in controlling their feelings of worry. This is explained by the psychological symptoms of anxiety (National Institute of Mental Health, 2022). Thus, it can be said that students who participate in online exams experience some physical and psychological anxiety.

In the structuring phase of the research, the views of the participants on technical anxiety during the online exam were examined. Figure 2 is designed to demonstrate the findings in regard to participants’ views about technical anxiety.

Fig. 2
figure 2

Findings related to the technical anxiety symptoms

As seen in Fig. 2, the findings obtained from the answers received from the participants showed that technical anxiety in online exams consisted of five subcategories. Participants stated that they were most concerned about the lack of technical skills in online exams. Some sample quotes are given below:

Participant P11 stated that she had technical anxiety in online exams and expressed this as follows: "I am stressed by constantly thinking that my internet connection will be cutoff during the online exams. I always worry about what to do if my computer breaks down before the exam. In addition, in some exams, although I click on the answer, the system does not accept it. I think online tests are a big problem for students with no solution." Similarly, participant P7 stated that he experienced technical anxiety during the online exam and expressed it as follows:”My computer got frozen in the last exam and I couldn't do anything. I don't understand much about computers. I had to restart it and therefore my time was spent for rebooting it. So not enough time left for the exam. I know many people have similar problems. Online exams are a disaster for students.” All these situations expressed by the participants were concerns arising from technical situations. Internet connection problems, lack of technical skills, systemic problems and difficulties in reading from the screen during the online exam cause participants to experience anxiety in online exams. According to the literature review and the interview results, a strong evidence was obtained about including items related to physiological, psychological, and technical anxiety in the scale. The draft version of the scale consisted of eleven items for the psychological anxiety dimension, nine items for the physiological dimension, and ten items for the technical anxiety dimension. There were a total of 30 items in the draft version. At the end of the second phase, expert views were obtained. The scale was reviewed by three experts from the field. In addition, it was reviewed by a language expert to ensure items’ reading level. Moreover, twelve undergraduate students were asked to review the draft version. The results of the cognitive interviews showed that one items had a similar content. Based on the obtained views, one item was deleted from the psychological anxiety dimension. There were 26 positive and three negative items in the revised version of the scale. It was designed to be a five-point Likert type, where 1 = never, 2 = rarely, 3 = sometimes, 4 = often, and 5 = very often.

Phase 3: Quantitative evaluation

In this phase, exploratory factor analysis (EFA) was conducted to examine the factor structure of the revised version of the scale. In this phase, the participants consisted of 442 students (male: n = 266, female: n = 176, age: M = 22.02 years) from three different universities. The participants were sent out a form including questions about their demographic information as well as the items of scale via Google Forms. They were asked to answer questions by considering how they feel or think before and during an online exam. Participants stated that they used computers (N = 229), smartphones (N = 211), and tablets (N = 2) during the online exam. Table 1 provides descriptive information about the participants.

Table 1 Demographic information about the participants of Phase-3

Before conducting EFA, Kaiser Meyer Olkin (KMO) and Bartlett tests were performed to ensure whether the data was suitable for a factor analysis. The KMO value was found to be 950 and the Barlett’s test results were χ 2 = 6953.87; df = 190 (p = 0.00). According to these results, the data were suitable for factor analysis. A factor analysis was performed with varimax rotation. According to Bryant and Yarnold (1995), rotation refers to a process in which eigenvectors (factors) are rotated to reach a basic structure. The results revealed two factors that had eigenvalue higher than one. Table 2 is designed to provide the factor loadings, eigenvalues, factor variance, cumulative variance, and the cronbach’s alpha value of the scale.

Table 2 The EFA results

According to the EFA results, nine items were dropped out due to their correlation values of less than 0.20. As a result, there were a total of 20 items (six items for the physiological dimension, six items for the psychological dimension, and eight items for the technical dimension). The EFA resulted in a two-factor structure explaining 63.4% of the variance with the eigenvalue higher than one. Also, the scree plot indicated a two-factor structure as well (Fig. 3).

Fig. 3
figure 3

Scree plot of 20-item version of the scale

In Fig. 3, it was observed that the factors after the second dot were less than one and close to each other, which was concluded that the scale had two factors. While the first factor explained 55.43% of the total variance, the second one explained 8% of it. The total variance explained was 63.43%. The items’ factor loading values ranged between 0.52 and 0.85 for the overall measurement. The factors were named based on the literature, items, and expert views. According to the findings, the items related to physiological and psychological anxiety symptoms gathered together. Thus, the first one was named as the physiological and psychological anxiety (PPA) factor. The second one was named as the technical anxiety (TA) factor since the items under this factor were related to technical anxiety symptoms. The Cronbach’s alpha values for the whole scale and the factors were found to be 0.98, 0.95, and 0.89, respectively.

Phase 4: Reliability and validation

In the reliability and validation phase of the study consisted 387 students completed a survey (male: n = 237, female: n = 150; age: M = 22 years SD = 2.99). Participants declared that they took the exam of at least once courses online (M = 1.07, SD = 0.251). All participants declared that they have a constant internet connection at home. In addition, the majority of the participants reported that they took the online exam with a computer. Table 3 provides descriptive information about the participants.

Table 3 Demographic information about the participants of Phase-4

The data obtained from these students were used in descriptive analysis, CFA, composite reliability, convergent validity and discriminant validity analyses. Descriptive analysis results of the data obtained from these students are given in Table 4.

Table 4 Descriptive analysis results

As a result of the analysis, it was observed that the mean of the scale was 1.48 (SD = 0.506). In the analysis made in terms of sub-dimensions (factors), the mean of PPA was 1.34 (SD = 0.497) and the mean of TA was 1.69 (SD = 0.658). It was observed that the Skewness and Kurtosis values of the scores obtained from the scale were within the interval of ± 2.

Confirmatory factor analysis

Although EFA is widely used in developing psychological scales, there are many contradictory situations regarding EFA including which extraction and rotation method are acceptable and how to decide the number of factors (Tabachnick & Fidell, 2007). Therefore, it is suggested to conduct other analysis to ensure the results of EFA (Osborne & Fitzpatrick, 2012). For this particular study, a confirmatory factor analysis (CFA) was performed. Although Schmitt (2011) stated that in order to test the results of EFA, the same data set may be used for CFA, there are opposite views as well (Schumacker & Lomax, 2010). For instance, it is suggested to split the data set into two parts if the data set is large enough to conduct EFA and CFA or to collect data from a different sample to conduct CFA after performing EFA (Schumacker & Lomax, 2010). Since the purpose of CFA is to test the EFA results to make sure the obtained structure is reliable (Brown, 2006), data collection procedure was repeated with a different sample group who had similar characteristics to which EFA was performed (Fig. 4).

Fig. 4
figure 4

Confirmatory model

Only one modification was added between e3 and e4. The item loadings ranged between 0.63 and 0.80 for the physiological and psychological anxiety factor and 0.54 and 0.91 for the technical anxiety factor. The standardized parameter estimates and t value of the CFA were significant (p < 0.001). The fit indices are provided in Table 5.

Table 5 Results of CFA

According to the table, the X2/sd value was 3.242, the CFI, AGFI, GFI, IFI and NFI values were close to one, and the RMSEA and SRMR values were smaller than 0.080. Based on the sample size, the Chi-square values were found to be an acceptable fit (Kline, 2005). Those values indicate a good fit between the model and the observed data (Schreiber et al., 2006). The results proved that the scale has a two-factor structure. In addition, the t-values were between 11.13 and 20.07 for the items in the scale, which were significant (p < 0.01).

Internal consistency reliability

Internal consistency reliability was used to measure the reliability of survey items in a construct. Internal consistency reliability is achieved when all items of such measures can reflect the same underlying construct (Myrtveit & Stensrud, 2012). Crobach’s alpha (α) and composite reliability are two indicators to measure internal consistency of reliability. To achieve internal consistency reliability, the recommended level of α should be more than 0.70 and composite reliability value should be between 0.70 to 0.95 (Hair et al., 2017a, b). The correlation among technical anxiety, psychological and physiological anxiety structure, cronbach’s alpha, and composite reliability values are presented in Table 6.

Table 6 Correlation Matrix, Cronbach’s alpha, and Composite Reliability

Based on Table 6, it was observed that technical anxiety, psychological and physiological anxiety correlated significantly with each other. The cronbach’s alpha values for the whole scale and the factors were found to be 0.94, 0.90, and 0.93, respectively. In addition, composite reliability value for PPA and TA was 0.94 and 0.90. This result concluded that all items in this survey study were reliable as they reflected to its own underlying construct.

Convergent validity

Convergent validity was used to measure the degree of the correlation between items in the same construct (Campbell & Fiske, 1959). Convergent validity is achieved when items in a same construct are strongly correlated to each other (Bagozzi & Yi, 2012). Factor loading and Average Variance Extracted (AVE) are two indicators to measure convergent validity. To achieve convergent validity value of AVE of each construct should be exceeded 0.50 (Hair et al., 2017a, b). The factor loading and AVE are shown in Table 7.

Table 7 Factor Loading and Average Variance Extracted (AVE) Values

Based on Table 7, it was determined that all items in a same construct were strongly correlated to each other. Thus, this suggested that the survey items of the study have a good convergent validity.

Discriminant validity

Discriminant validity was used to measure the degree of the correlation between items in different construct (Campbell & Fiske, 1959). Discriminant validity is achieved when items in a particular construct are not highly correlated with any items in other constructs (Hulland, 1999). Fornell-Larcker criterion and heterotrait-monotrait (HTMT) ratio of correlations criterion are two indicators to measure discriminant validity. To achieve discriminant validity, square root of the construct’s AVE should be the highest correlation with any other constructs and the HTMT value should be lower than 0.90 (Hair et al., 2017a, b).

Based on Table 8, the square root of all the constructs’ AVE was larger than the squared correlation with any other constructs. This means that items in the construct of the PPA were not highly correlated with any items in construct of TA. In addition, all the values of construct passed HTMT value of 0.90 tests (HTMT = 0.674). Therefore, with these results, we concluded that discriminant validity issue was not existed in this study.

Table 8 The Discriminant Validity Results

The final version of TAS-OE is provided in Table 9. It has 20 items with two factors. The physiological and psychological factor has 12 items (item 1,2,3,4,5,6,7,8,9,10,11, and 12) and the technical anxiety factor has eight items (item 13,14,15,16,17,18,19, and 20). There is only one reverse-coded item (item 13). It is designed to be a five-point Likert type, where 1 = never, 2 = rarely, 3 = sometimes, 4 = often, and 5 = very often. For analysis, a total score is calculated. A high score indicates a high level of test anxiety.

Table 9 The final version of TAS-OE

Discussion

The goal of this study was to develop and validate TAS-OE to assess students’ test anxiety levels in online exams. The study consisted of four phased. In the first phase, an extensive literature review was conducted and an item pool was developed based on the literature. In the second phase, interviews were conducted with students with a high level test anxiety. They were asked about their anxiety experiences before and during online exams. The interview data were combined with the results of the extensive literature review. As a result, a draft version of TAS-OE with 29 items was developed. The following phase allowed the researcher to conduct exploratory factor analysis to examine the factor structure of the scale. The EFA was conducted with 442 cases and resulted in a two-factor structure with 20 items. There were nine items dropped out from the scale, which left 20 items. In the fourth phase, this structure was tested through confirmatory factor analysis. Overall, the scale showed acceptable fit to the data and loadings in line with the literature, expert views, and students’ real anxiety experiences in online exams. The final version of the scale consists of 20 items with two factors. The first factor is the physiological and psychological anxiety factor with twelve items and the second factor is the technical anxiety factor with eight items.

The two-factor structure is consistent with previous studies revealing that test anxiety is a multi-dimensional structure (Alibak et al., 2019; Putwain & Daniels, 2010). More specifically, the physiological and psychological anxiety factor of TAS-OE coincides with the psychological and somatic dimensions in the questionnaires related to exam anxiety (Mowbray et al., 2015). In addition, in some studies on anxiety (Sarason, 1978; Barnes et al., 2019), it is seen that physical and psychological symptoms are considered together. Moreover, it has been suggested that physical and psychological symptoms are a common structure influenced by personality traits (Spangler, 1997). In this context, considering the physical and psychological anxiety dimensions together for the online exams revealed in the current research is compatible with the literature. Also, the results of Folk’s study (2010) support the technical anxiety factor of TAS-OE. Actually, there exist a limited number of scales measuring students’ online test anxiety levels (Alibak et al., 2019). A special feature of TAS-OE that differentiates it from the other scales is that it includes items related to technical anxiety. The reason to include such items is that use of technology in learning process may increase students’ anxiety levels (Matsumura & Hann, 2004), which negatively affect learning outcomes (Brown et al., 2004). Therefore, TAS-OE will enable researchers to identify the online test anxiety levels of university students in terms of physiological, psychological, and technical aspects and to determine ways to diminish negative effects of test anxiety on academic performance. Although the impact of COVID-19 has decreased worldwide, the place and importance of online tests in our lives cannot be denied. The results of the current research provide a reliable tool for determining online test anxiety in terms of improving the effectiveness and efficiency of online learning.

Theoretical implications

This study conceptualized the structure of online test anxiety. TAS-OE extends existing work of online exam anxiety or stress in the following ways: First, TAS-OE is not related to online exams conducted with specific technologies and therefore is also applicable to future, anticipated technologies. The items refer to online test anxiety in TAS-OE indicate that online tests that individuals are not yet accustomed to can cause anxiety. Second, the TAS-OE addresses online exams as an ongoing process involving the integration of technology into all aspects of the education and training process. This process perspective is reflected in two ways: (a) The TAS-OE includes a unique subscale describing anxiety triggers related to online exams (b) Items are formulated in a way that incorporates a process perspective, mostly by using verbs such as “feel” or “experience”, which describe feelings, skills, and behaviors. Finally, TAS-OE differs somewhat from test anxiety constructs. (1) technical skill anxiety (specific to the nature of online exams) and (2) psychological and physiological anxiety (similar to the construct of test anxiety). The theoretical implications of the research were conceptualized by researching online test anxiety in a concrete way based on qualitative and quantitative data.

Practical implications

As online exams are increasingly used in education environment, attitudes and fears towards these exams should be continuously monitored with effective measures. The TAS-OE scale can be used as such a measure by educators or students to identify the “top triggers” of online test anxiety in education institutions. Completing the TAS-OE can help individuals develop measures to counteract their online test anxiety. The current study revealed that online test anxiety is associated with technical skill, cognitive and behavioral indicators. Online test anxiety can increase an individual's perceived online test anxiety and make it difficult for them to have positive experiences with learning. In conclusion, the 21-item TAS-OE has been shown to provide satisfactory reliability, composite reliability, criterion validity, content validity, discriminant validity, and convergent validity.

Limitations and future research

There were several limitations of the study. The first limitation is about students’ high level of test anxiety in online exams. Since this development and validity study was conducted during COVID-19 epidemic, students' anxiety levels about the epidemic may have an impact on the findings. Therefore, future research must examine the effects of students’ perceptions about epidemic on students’ test anxiety in online exams. In addition, in order to ensure the validity and reliability of TAS-OE, this study must be replicated in learning environments in which blended learning approach is adapted. The second limitation is about technical anxiety. Individuals who need to control and manage technological devices are likely to experience anxiety when they have insufficient knowledge about technology and how to use it. Therefore, it is suggested to examine the structure of online exams to determine which factors trigger students’ anxiety. The third limitation is about the sample and the sample size of the study. The data were collected from university students from different majors. Future research must consider replicating the study with participants from different grade levels, majors, and cultures and with a larger sample size. This enables researchers to examine TAS-OE’s psychometric properties. Further, the participants were recruited by using convenient sampling method; future studies must consider using purposeful sampling method in order to determine other possible factors that influence their test anxiety in online exams.