1 Introduction

Figures reveal that science, technology, engineering and mathematics (STEM) sector suffers a loss of diversity once the higher stages of the career (Blickenstaff, 2005; Diekman et al., 2010; Sadler et al., 2012).

The latest figures updated by the European Institute for Gender Equality (EIGE) (2021) reveal that the representation of women in higher STEM studies in Europe, and specifically in Spain (ISCED levels 5–8), does not reach parity rates. They also show the extent of horizontal segregation in STEM higher education. In 2019, in Spain, 4.6% of the university student body represented male students of Information and Communication Technologies (ICT), while women accounted for 0.7%. In software and application development and analysis, 1.8% were men, while women accounted for 0.3%. Men accounted for 2.8% in electronics and automation, while women accounted for 0.4%. This is also the case in other European countries. For example, in ICT in Germany, 5.5% represented men, while women are only 1.5%. The same situation occurs in software and application development and analysis. For example, 5.6% were men in Estonia, while women represented only 2.2%. In Greece, 5.9% represented male students of electronics and automation, while women accounted for only 1.3%. This gender disparity is also observed in mathematics. For example, 0.4% represented male mathematics students in Ireland, while women accounted for only 0.1%. This apparent gender disparity in STEM fields across Europe also occurs in reverse in education and health (European Institute for Gender Equality (EIGE), 2021). However, it is striking that if the student body is not analysed by field, but in general, there is parity. For example, in 2019 in Spain, 53.7% of university students were women, and 46.3% were men. Therefore, the problem is not that more men than women are studying at university. In addition, Spanish universities do not discriminate based on gender when accessing their degrees.

In conclusion, the STEM education sector suffers from underrepresentation of gender diversity, mainly women Jacobs et al. (2017), and notably, this underrepresentation occurs in the engineering sector (Cvencek et al., 2021; Dou et al., 2020; Keku et al., 2021; Moote et al., 2020; Snyder et al., 2018).

Also, studies show that the origin of the gender gap is not biology, innate traits that might differentiate people according to their sex, or specific components of which professions people are to be engaged in according to their sex or gender (Bourdieu, 1984; Cheryan et al., 2013; Corbett & Hill, 2015; Nguyen & Ryan, 2008; Nguyen & Riegle-Crumb, 2021; O’Brien & Crandall, 2003). The origin of the gender gap lies in the social constructions forged in societies according to the interpretation of the world held by the people who compose them (Leslie et al., 2015; Master et al., 2016; Thébaud & Charles, 2018).

As Ertl et al. (2017) point out, in recent decades, the proportion of women in these fields has remained constant at approximately 25% in the European Union, thus not reaching parity representation. Also, as Talley and Martinez Ortiz (2017) point out, women account for less than 20% of engineering and computer science degrees, while they constitute less than 15% of all engineers working in the US.

There are false beliefs that women are more attracted to studies associated with caring for others or literary studies. At the same time, there is an erratic belief that men are attracted to more technical and rational professions of building and producing things. However, this justification is reductionist and binary (Diekman et al., 2010; Guo et al., 2018; Sikora & Pokropek, 2011; Su & Rounds, 2015). It seems that guilds are to be divided into two simplistic categories without considering the presence and importance of the environment.

Thus, the gender gap in STEM areas is a global problem and it is caused by different factors (Lent et al., 1994; Osborne et al., 2003). Different studies have been developed to investigate the influence of stereotypes on the decisions made concerning higher education studies (Cadaret et al., 2017; García-Holgado et al., 2019, 2020a, b; Makarova et al., 2019; Powell et al., 2012; Tomassini, 2021; Verdugo-Castro et al., 2019). The starting point is that different obstacles and barriers generate segregation in tertiary studies, knowing which ones can tackle them to reduce the gender gap.

The Social Cognitive Career Theory (Lent et al., 1994; Lent & Brown, 1996) is taken as a reference in the theoretical framework. The authors argue that internal and external factors condition the individual when acquiring knowledge and pursuing higher education. Thus, attention must be paid to the immediate context and the individual’s external experiences and influences. In addition, according to Bourdieu (1980), the existence of social representations as obstacles must be considered. The theories of these authors are baseline studies for successors, and several studies are currently being developed in this field. Recent research aim to reduce the Stereotype Threat and the Leaky Pipeline (Beasley & Fischer, 2012; Blickenstaff, 2005; Goulden et al., 2009).

Kang et al. (2019) emphasise the importance of preventing stereotyping and erratic patterns in teachers by focusing on future career prospects. The study results reveal that non-inclusive language, choice of heteronormative teaching material, and communication style can leave some of the student body out of context, particularly girls.

Stereotypes and prejudices formed and acquired early encourage gender discrimination. In this way, inequalities between men and women are reinforced throughout the different stages of their lives. In the academic sphere, textbooks’ content and hidden curriculum must be carefully considered since they are mechanisms for teaching. In some books, gender discrimination is transmitted through the representation of gender roles. As a result, students may come to think and act stereotypically concerning people depending on their gender because of the content they have been taught about. Regarding research and interventions in the field, Papadakis (2018) applies a qualitative study about the content of textbooks to identify sexist elements and gender stereotypes presented in materials used by computer science teachers and students in the general lyceum of Greece. Also, Papadakis et al. (2018) address female underrepresentation in the educational field of information science.

On the other hand, in the school context, Brauner et al. (2018) apply an initiative to enhance interest in STEM through robotics with German school children aged 10–13 years. In the initial phase, they are asked to draw a picture of a computer scientist. The results reveal social stereotypes about computer scientists, as they primarily draw men with a particular nerdy character and in solitary situations.

Further research on gender differences in the IT (Information Technology) sector is presented in Denmark (Borsotti, 2018). The study empirically investigates the main socio-cultural barriers to female participation in the Bachelor’s Degree in Software Development at Copenhagen’s University of Information Technology. The participants of the study attribute stereotypes as the reason for the gender gap.

On the other hand, gender stereotypes influence intrinsic factors, such as students’ self-concept. In the study by Ertl et al. (2017), the aim is to determine the academic self-concept of female university students studying a STEM-LPF degree, i.e. with a marked under-representation of women (equal to or less than 30% of women). The results of the interviews show the ambiguity of the family factor. In this study, all parents had STEM-related backgrounds, so they could support their daughters in the STEM field and stimulate their cognitive development. However, such support may also evoke an attribution of lower STEM skills. These attributions may influence their daughters’ academic self-concept in STEM.

Also, the study by Olmedo-Torre et al. (2018) analyses the influences on female STEM students. The authors divided women into two groups, those studying Computing, Communications, and Electrical and Electronic Engineering (CCEEE women) and those studying other non-CEEE degrees. The results reveal that the secondary school teachers and the peer group supported the women’s decision to study. However, their family did not support the decision at these levels. In addition, the CCEEE women showed less support from their family, teachers and peer group than the non-CCEEE women. This affected their self-concept, as the female CCEEEs felt less capable than their male peers at the beginning of their studies. Finally, on the causes they attributed to female under-representation in STEM, the responses were as follows: social stereotypes (31.5%), immediate environment (14.5%), women do not like engineering (11.03%), lack of information in high school (8.67%), stereotypes in education (8.18%), lack of female role models (7.93%), gender-biased toys (7.43%), job discrimination (7.19%) and engineering being difficult (5.82%).

According to García-Holgado et al. (2020a, b), the support received before starting the undergraduate course the support received before entering a STEM career is a determinant factor. The peers and family are the most important perceived supports, while teaching staff and institutional support have lower rates. Moreover, For female students, the support of their friends, schoolmates, and school matters.

Some authors take the research to the environment. For the study by Reich-Stiebert and Eyssel (2017), conducted in Germany, the aim was to explore the influence of gender stereotypes on learning with a robot in higher education. It was concluded that female participants outperformed male participants in typically male tasks and vice versa. Participants paid more attention to tasks that did not correspond to their gender and obtained better learning outcomes. Finally, the study by Finzel et al. (2018) also aimed to combat gender stereotypes that impact the gender gap by carrying out an intervention proposal to increase motivation for STEM studies. The intervention was conducted with students aged 16–18 within the “make IT” mentoring programme.

In this context, it should be stressed that different techniques and instruments can be used to understand the phenomenon of the gender gap and the causes of segregation. The various methodological options are designed for a specific population and with predefined objectives.

This study aims to determine how university students view higher education in science, technology, engineering, and mathematics according to gender, in order to acquired biases and design measures to remove them. For this reason, a specific instrument is required, aimed at university students.

Considering the optimisation of existing resources, instruments designed to address the gender gap in STEM programmes in higher education were analysed (Verdugo-Castro et al., 2019). After an extensive and exhaustive search in databases such as Web of Science, Scopus, ERIC, Dialnet and Google Scholar, 75 publications were identified that referred to questionnaires close to the study of the phenomenon (https://cutt.ly/bPKDRY). After reading the publications, only 18 indicated that the referenced instrument had undergone a validation process. Also, out of 18 possible resources, only 13 had actual proximity to the topic of study. However, there were two main reasons why it was decided that the instruments listed did not meet the study’s objective. The first reason was that, although the questionnaire was identified in the publication, it was not searchable in the databases, and the items were not available. The second cause was that the instruments analysed were not designed for the Spanish and European higher education framework, and were aimed at university students, focusing the research focus on opinion and bias detection. The instruments closest to the topic of the study were contextualised in the American and Chinese environments or focused exclusively on specific STEM domains, such as biology or computer science. They also target early childhood and adolescence and are not aimed at university ages.

No validated instruments dealing with the analysis of opinion on the topic of study were detected in the European context with university students. Therefore, the review results were revisited to identify data collection instruments associated with gender stereotypes and gender ideology. Those initially discarded as not fully aligned with all STEM disciplines but addressed bias and ideology were taken up.

Finally, five publications containing questionnaires that analysed gender ideology and stereotypes were identified (Banchefsky & Park, 2018; Duncan et al., 2019; Godwin, 2014; López Robledo, 2013; Rossi Cordero & Barajas Frutos, 2015) and could be used to construct a new instrument, the Questionnaire with university students on STEM studies in Higher Education (QSTEMHE).

None of the five instruments explicitly address the gender gap in STEM areas but are aimed at specific fields or target populations other than the main target of the research, which is the university population. The Banchefsky and Park (2018) instrument analyses gender ideologies, both negative and positive, and gender stereotypes concerning science. On the other hand, the Duncan et al. (2019) questionnaire delves into heteronormative attitudes and tolerance towards gender. On the other hand, the Godwin (2014) questionnaire is based on the Social Cognitive Theory of Career Development (Lent et al., 1994) and aims to study Critical Engineering Agency (CEA). The instrument addresses physical identity, mathematical identity, science identity, and agency beliefs. The López Robledo (2013) questionnaire analyses the attitude towards technology, the opinion about Information and Communication Technology disciplines, the obstacles that can be encountered in ICT studies, and the importance of having a reference or role model. Finally, in the Rossi Cordero and Barajas Frutos (2015) instrument, the categories of analysis are interests, values, and choice factors, for the individual dimension. The categories for the STEM studies dimension are motivation, role models and influences, and self-efficacy. Finally, there is a third dimension, challenges, and the categories for this are facilitators, obstacles, and opportunities.

For this reason, a selection of items was made to consider only those related to finding out what opinions university students have about higher studies in STEM, according to gender. The selection of the items was made by consensus by all the authors of this article, assessing their suitability for studying the gender gap in STEM, considering the theoretical framework.

Finally, the design process included a pilot phase in an exploratory study, and a validity and reliability study. After validation, the final version of the questionnaire was drafted, consisting of dimensionality of five scales: Gender Ideology, Perception and Self-perception, Expectations about Science, Attitudes and Interests.

The Questionnaire with university students on STEM studies in Higher Education (QSTEMHE) is a new instrument that helps researchers analyse university students’ perception about STEM studies to detect problems and provide new approaches to reduce the gender gap in these areas. To ensure the instrument’s validity and realiability, a methodological procedure of empirical validation has been followed. The procedures followed have been detailed to facilitate an understanding of the design of the questionnaire and its validation.

This paper has been divided into five parts. Section 2 describes the methodology applied to design and validate the instrument. Section 3 presents the data analysis and results. Section 4 discusses the results obtained. Finally, Section 5 summarizes the main conclusions of the work.

2 Methods

A non-experimental quantitative study (Sarrado et al., 2004) was applied to validate the QSTEMHE instrument. The instrument aims to find out the opinions of university students about higher studies in science, technology, engineering and mathematics, according to gender. The different phases that have been applied are identified in Fig. 1.

Fig. 1
figure 1

Workflow of study design and data analysis. Source: Own production

The empirical validation of the instrument was carried out through Reliability Analysis and Exploratory Factor Analysis. Once the theoretical construct was defined based on the authors of the sources that inspired the QSTEMHE, the statistical procedure was applied. Firstly, the first reliability analysis was applied to check the suitability of the theoretical construct. Once it was applied, it was found that the theoretical construct did not work at the empirical level. Therefore, Exploratory Factor Analysis was used to perform factor reduction. Once applied, a second reliability analysis was carried out. In this second analysis, an adaptation of the construct was introduced, respecting the principles of theoretical dimensionality. Finally, in the third reliability analysis, two scales were adjusted after checking the homogeneity of the items. Finally, the final version of the instrument has five dimensions. To conclude the statistical analysis procedure, hypothesis tests were conducted on the five scales with the predictor variables.

2.1 Design of the instrument

The instrument’s items are based on previous questionnaires designed by other authors (Banchefsky & Park, 2018; Duncan et al., 2019; Godwin, 2014; López Robledo, 2013; Rossi Cordero & Barajas Frutos, 2015).

Concerning the present study, the topic of study addressed by QSTEMHE is the opinion that university students have about higher studies in science, technology, engineering and mathematics, according to gender.

The instrument was initially designed with 66 items (Verdugo-Castro et al., 2020): 37 ordinal items that have undergone a process of empirical validation (criterion variables), 5 open questions and 24 questions asking about socio-demographic variables (predictor variables).

First, before applying the validation process, seven dimensions were defined based on theory and the dimensions identified in the questionnaires of the selected items (Banchefsky & Park, 2018; Duncan et al., 2019; Godwin, 2014; López Robledo, 2013; Rossi Cordero & Barajas Frutos, 2015). These dimensions are Gender Ideology, Attitudes, Perceived Image, Interests, Women’s Skills, Perception and Self-Perception and Expectations about Science. These have been subjected to a statistical analysis process to validate the construct.

2.2 Pilot study

The approval of the Bioethics Committee of the University of Salamanca (Spain) was obtained for this research before launching the pilot study. This procedure was necessary because the data collection was carried out with human subjects. Finally, a favourable report was obtained with registration number 557.

The exploratory study was conducted online in 2020 through an online application, a customised version of LimeSurvey. The data collection was done through email, sharing the questionnaire with mailing lists of professors from Spanish universities, using a snowball approach. During the data collection process, a limitation was encountered, namely the contingency caused by the COVID-19 health crisis. Initially, the pilot study was scheduled to be conducted face-to-face in classrooms, but this was impossible. The need to virtualise the teaching required the process to be carried out online.

2.3 Study sample

The final sample consisted of 115 undergraduate students from Spanish universities (106 women, 8 men, and one person who does not identify as a man or a woman). The average age was 20, but the age range was from 18 to 34. There were 22 first-year students, 36 s-year students, 47 third-year students, 9 fourth-year students, and 1 fifth-year student. The participants came from Social and Legal Sciences (Pedagogy, Speech Therapy, Social Education), Health Sciences (Nursing and Pharmacy), Sciences (Chemistry), and Engineering and Architecture (Industrial Design Engineering). Finally, the participants were from eight different Spanish universities: University of Salamanca, Universitat de València, Universitat Politècnica de València, Universidad de Alcalá, Universidad Rovira i Virgili, Universidad Católica Santa Teresa de Jesús de Ávila, Universidad de Granada and Universidad de Sevilla.

3 Data analysis and results

The analyses have been divided into two blocks. On the one hand, empirical validation was carried out with the ordinal items (criterion variables), as established in the protocol. Secondly, hypothesis tests were applied to the predictor variables (socio-demographic variables) based on the validated scales.

First, a statistical validation process was applied through an Exploratory Factor Analysis and three Reliability Analyses to study the dimensionality of the instrument and the validity and reliability of the items. Exploratory Factor Analysis is a data reduction technique that allows finding homogeneous groups of variables within several variables, i.e., dimensions (Akaike, 1987; Lloret-Segura et al., 2014; McCoach et al., 2013).

After applying the pilot study (Verdugo-Castro et al., 2020), the descriptive statistics of the mean and standard deviation of the items were calculated. Table 1 shows the results, where the instrument’s items are identified with their original enumeration from the questionnaire.

Table 1 Mean and standard deviation of ordinal items

Based on the results, although there are low average values for the idea that if a woman decides to dedicate herself to a traditionally male field, she will have to adopt male customs and behaviours to be successful, the slightly positive trend for this statement is worrying. On the other hand, they also believe that men can act as they wish in the workplace without responding to stereotypical patterns. However, it is surprising that the variability is higher for the item referring to women. On the other hand, for item 42, the standard deviation is wide, so the variability leads us to think that some people do consider nerds to be those who study these subjects. Furthermore, while rejecting the idea that STEM subjects are more masculine than others, the standard deviation is higher. Finally, participants feel that men who do not conform to the male canon are not good role models. The results are similar for women. In addition, there is a belief that women and men have equal employment opportunities associated with ICT careers.

In addition, the correlations between the variables were calculated. Correlations indicate the strengths and directions established in a linear relationship and proportionality between two statistical variables. Pearson’s correlation was used to calculate the correlations. The correlations calculated through the Pearson Coefficient range from − 1 to 1. Values with a negative sign in the Pearson Coefficient reflect an inverse relationship, and values with a positive sign reflect a direct relationship. The value 0 reflects no relationship, and as a result approaches the values − 1 and 1, the relationship between the variables becomes stronger. Table 2 presents the high and medium-high correlations, which indicate a strong strength between the variables.

Table 2 Correlations between variables

As can be seen, the variables that have a high correlation between them are “Men should not act like women at work” (29) and “Women should not act like men at work” (30); “Men who are not masculine are good role models” (31) and “Women who are not feminine are good role models” (32); “I feel limited by the gender labels people put on me” (52) and “I feel limited by the expectations people have of me because of my gender” (53). On the other hand, the variables that have a medium-high correlation between them are “Women working in STEM areas have to be/act like men” (43) and “To have a successful career in STEM you need to think and act like a man” (44); “I feel restricted by the gender labels people put on me” (52) and “In the past, I have been teased or harassed for acting like the opposite sex” (56); and “Science is useful in my everyday life” (59) and “Learning science has made me more critical in general” (60). All these items will be included in the final version of the questionnaire.

3.1 First reliability analysis

According to theory, based on the clustering of the items into the seven dimensions, the reliability of the instrument’s items was calculated. Table 3 presents the Cronbach’s alpha of the seven dimensions.

Table 3 Reliability statistics of the seven dimensions

As can be seen from the calculation of Cronbach’s alpha, the statistical results are not adequate, meaning dimensionality will have to be studied again.

Also, item statistics for all dimensions are presented in Table 4.

Table 4 Item total statistics of the seven dimensions

Regarding the first reliability analysis of the Gender Ideology (IG) scale, we conclude that the items with low homogeneity are 27 and 40. For the Attitudes scale we conclude that the items with low homogeneity are 35, 54, 55, 57 and 58. For the Interests scale, the item with the lowest homogeneity is item 51. For the Perceived Image scale, the item with the lowest homogeneity is item 50. For the Women’s Skills scale, the item with the lowest homogeneity is item 45. For the Perception and Self-Perception scale, the item with the lowest homogeneity is item 56. For the Science Expectations scale, the item with the lowest homogeneity is 61.

3.2 Exploratory factor analysis

Although seven dimensions are defined at the theoretical level, it has been decided to apply the Exploratory Factor Analysis because the dimensions constructed at the theoretical level, i.e., the dimensionality and the elements that compose it, report statistics with low values. This means that the variables do not behave homogeneously among themselves, which finally implies that the elements that at the theoretical level made up a dimension, in reality, do not make it up. It was also decided to apply Exploratory Factor Analysis because two of the dimensions proposed at the theoretical level, Women’s Skills (WM) and Perceived Image (PI), can be absorbed by other dimensions. The content they contain can, in turn, be compiled in other dimensions.

As for the lack of coincidence between the theoretical dimensionality and the results reported at the empirical level, as verified by the first reliability analysis, the reason is that the instrument measures people’s opinions on a topic of social impact such as gender. It is expected that there will be a diversity of opinion, which also produces more variability in the responses.

Therefore, we used the Exploratory Factor Analysis after applying the first Reliability Analysis. Initially and following the theory, we started with seven dimensions; however, the aim is to reduce these, obtaining those in which the items share meaning.

Before deciding to use Exploratory Factor Analysis for the study, it was necessary to apply the KMO and Bartlett’s test. The minimum recommended value of the KMO statistic is 0.5 to use the Exploratory Factor Analysis effectively. For the sample data of the study, the KMO test gives a significant value (Kaiser-Meyer-Olkin measure of sampling adequacy = 0.588), as can be seen in Table 5, so that the Exploratory Factor Analysis can be continued.

Table 5 KMO and Bartlett’s test

Rotation is used to execute the Exploratory Factor Analysis, as it allows the variables to be plotted in a cloud. The Exploratory Factor Analysis carried out in this study was Principal Component Analysis with Oblimin Rotation. Using Principal Component Analysis as an extraction method, 13 different components were obtained.

To the total variance explained (Table 6), the total eigenvalues and the sum must be considered. The total variance represents how much variability is explained by the model. In this case, as can be seen, from the 13 components created through the principal component extraction method, 70% of the variability is explained. However, the aim is to reduce the dimensions. On the other hand, valuing the total of the eigenvalues, a value lower than 1 implies that it does not explain anything, so it must be higher than 1. In this case, the 13 components have a value higher than 1, although it can be seen that the components with the best values are the first 5, so the aim is to define five dimensions for the new dimensionality.

Table 6 Total variance explained

In the sedimentation plot (Fig. 2), the information of the total eigenvalues of the total variance explained in the previous figure is visually represented.

Fig. 2
figure 2

Sedimentation graph. Source: Own production with SPSS

The component matrix is also presented in Table 7. Although 13 different components have been formed, the variables do not saturate equally in all factors. Considering the results, eleven factors with high saturations would remain, but only three of the components have at least three items. Thus, it is found that factor reduction is required.

The variables attributed to its component are marked in bold for the proposal of the new dimensionality. The results obtained in the sedimentation graph have been considered for this composition. In the new dimensionality, coherence is sought between the variables and the dimensions, the smallest possible number of these and acceptable metric values for the new composition. In this way, the factorial reduction to 5 dimensions could be made. For this reason, in some variables, the box with the highest correlation appears in italics, given that although it is the correlation closest to 1 or -1, it is far from the main components, or there is no coherence with the other variables.

Table 7 Component matrix

Finally, Fig. 3 shows the component graph in rotated space, where the closeness or distance between the different items grouped can be seen.

Fig. 3
figure 3

The component graph in rotated space. Source: Own production with SPSS

3.3 Second reliability analysis

The five new scales or dimensions were formulated once the Exploratory Factor Analysis was carried out. The dimensions of Women’s Skills and Perceived Image would have been absorbed by the dimensions of Gender Ideology (Women’s Skills), Interests and Attitudes (Perceived Image). The Gender Ideology, Attitudes, Interests, Perception and Self-Perception, and Expectations about Science scales have been maintained. For this reason, the Reliability Analysis has been applied again, based on the reformulation of the dimensionality, to find out the behaviour of the items in the new dimensions and group them into these.

Table 8 presents the Cronbach’s alpha of the five dimensions.

Table 8 Reliability statistics of the five dimensions

As can be seen from the results obtained, the reliability results have improved considerably after the construct check. The scales that could still improve their Cronbach’s alpha are the gender ideology, interests, and perception and self-perception scales. For this reason, it is advisable to study which items show low homogeneity.

Also, item statistics for the dimensions are presented in Table 9.

Table 9 Item total statistics of the five dimensions

The second reliability analysis of the Gender Ideology scale concludes that the items with low homogeneity are 25 (All humans are fundamentally the same, regardless of their gender) and 36 (Women have the same technical skills as men). It was decided to eliminate these two items, reducing the scale from 15 items to 13 to improve the scale’s reliability. For the Perception and Self-perception scale we concluded that the items with low homogeneity are 27 (Men and women have different but equally useful ways of accomplishing tasks), 35 (The well-being of the family is more important than the rewards of work) and 40 (Women and men have equal employment opportunities in ICT careers). It was decided to eliminate these three items, reducing the scale from 11 items to 8 to improve the scale’s reliability.

Before the exclusion of the items, the initial theoretical composition of the dimensionality was reviewed based on the theory of the instruments that inspired the questionnaire (Banchefsky & Park, 2018; Duncan et al., 2019; Godwin, 2014; López Robledo, 2013; Rossi Cordero & Barajas Frutos, 2015) to verify that the deleted items did not constitute fundamental components of the theoretical dimensionality. Item 25 and 35 do not contain content directly linked to the gender gap in STEM but are cross-cutting content. Item 27 can be applied in STEM but also other sectors. Finally, the content of item 40 can be analysed through other items that have been consolidated in the questionnaire.

Also, there are no items with low homogeneity for the Science Expectations, Interests and Attitudes scales, so it is decided not to remove items.

Finally, suppose it is reduced from 37 items to 32. In that case, a new Reliability Analysis must be applied for the dimensions that suffer reductions: the Gender Ideology scale and the Perception and Self-perception scale.

3.4 Third reliability analysis

The third Reliability Analysis was applied to the dimensions of Gender Ideology and Perception and Self-perception. These are the ones for which items were eliminated, namely items 25, 27, 35, 36 and 40. For the dimensions of Expectations about Science, Attitudes, and Interests, the results of the second Reliability Analysis were maintained. The Cronbach’s alpha of the dimensions addressed has improved by eliminating the five items with low homogeneity.

Table 10 presents the Cronbach’s alpha of the dimensions.

Table 10 Reliability statistics of the dimensions

As can be seen, by removing the five items, the results for Cronbach’s alpha have improved substantially for both scales.

Also, item statistics for the dimensions are presented in Table 11.

Table 11 Item total statistics of the seven dimensions

Table 12 shows the Cronbach’s alpha values resulting from the third reliability analysis for each scale (dimension).

Table 12 Reliability statistics for the five new dimensions of the instrument

Finally, Table 13 shows the composition of the items in the final dimensionality

Table 13 Final dimensions with their items

3.5 Definition of dimensions

Gender ideology is related to the social conception of gender roles and patterns. It may be marked by philosophies of equal opportunity and inclusion of gender diversity, or it may be characterised by binary logics based on masculinity and femininity, understood as canons to be followed (Keller, 1995). On the other hand, misperceptions about careers in STEM domains significantly impede women’s ability to pursue STEM career paths (Diekman et al., 2010). In turn, self-perception may also lead to low-interest rates and enrolment or continuation. Expectations about science relate to the outcomes that are expected from science, as well as from the study of science. Outcome expectations are beliefs about the effects of doing certain activities (Lent et al., 1994), in this case, about studying STEM domains or not.

Furthermore, attitudes towards science, according to Osborne et al. (2003), can be understood as the feelings, beliefs and values that a person has about an object, which may be, in this case, science, science at school, the impact science has on society, the science-based labour market, including scientists themselves. Finally, in terms of interest, there are studies such as that of Blázquez et al. (2011), which investigated the interest of students in Spain in pursuing higher education in engineering. The results reveal that 30% of the participants in the pilot study are not at the right age, i.e. ready to start higher education, which means that some of them decide which studies to pursue without being qualified to do so. Therefore, education systems aim to encourage interest in STEM fields; however, there is a loss of interest among students, resulting in declining enrollment. Studies such as Blickenstaff (2005) and Sadler et al. (2012), among others, show that women tend toward health and social sciences, while men tend towards technical and exact sciences; which suggests that the enjoyment of the subjects and the student body’s interest need to be deepened.

3.6 Post-validation analysis of the instrument

3.6.1 Frequencies, descriptive statistics and correlations of the new dimensions of the instrument

After validation, the statistics for the five dimensions were calculated and are shown in Table 14.

Table 14 Descriptive statistics of the dimensions

3.6.2 Correlations

In turn, Pearson correlations were calculated for the scales to study the degree to which the scores are associated, i.e., the relationship established between the scales.

As a result, it is concluded that the correlations between the five scales are low and medium-low (Table 15), which is positive because each scale addresses a different element to be measured within the instrument.

Table 15 Correlations for the scales

3.6.3 Sample distribution

The Kolmogorov-Smirnov normality test (Table 16) has also been applied to the five scales to see what the sample distribution is like. The p-value result (bilateral asymptotic sig.) for the five scales is less than 0.05, which means significant differences for the sample.

Table 16 Kolmogorov-Smirnov test for one sample

3.6.4 Chi-Square cross-tabulations

It is necessary to apply a test, in this case, a non-parametric test, to check whether or not there are significant differences in response to an item according to several groups (two or more).

As the items were ordinal, the Pearson Chi-square test was used. Contingency tables were used for the application. Besides Pearson’s Chi-square, Kendall’s Tau-b and Kendall’s Tau-c tests (ordinal by ordinal) were also applied for this analysis.

The socio-demographic data used for the comparison were gender, the area in which they live, the branch of university studies they studied, their motivation for choosing their studies, the position of their studies in the selection process for university entrance, previous interest in STEM-related higher education studies, participation in any STEM initiative or activity before university studies, whether a family member or other person in their environment has studied STEM, has been their academic reference, or has judged their educational decision, the socio-economic level to which consider the area in which they grew up to belong, and the studies of their parents.

About the Pearson’s Chi-square results, of the 32 final instrument items, 28 show significant differences according to some socio-demographic data. The item with the most differences is item 56, with 11 characteristics.

3.6.5 Hypothesis testing for the dimensions

Finally, different hypothesis tests were applied for the five dimensions (criterion variables) and the instrument’s socio-demographic variables (predictor variables). The hypothesis tests were also based on non-parametric methods. The Mann Whitney U test was used for two independent groups, and three independent groups or more groups, the Kruskal-Wallis test was applied.

In total, 18 significant differences were found by socio-demographic variables regarding the five dimensions. The results are shown in Table 17.

Table 17 Results of the significant differences of the hypothesis tests by dimensions

4 Discussion

Interpreting the results obtained for the sample in the hypothesis tests for all five scales, some of the factors that modulate the results for the Gender Ideology dimension are the environment in which the person lives, the motivations for choosing higher education, whether they have previously shown interest in the STEM domains and the higher education they are pursuing. The results are consistent with the Social Cognitive Career Theory (Lent et al., 1994; Lent & Brown, 1996), which shows that career choice is made according to intrinsic and extrinsic and environmental factors that modulate the decision. Furthermore, Bourdieu (1984) also states that human beings are involved in social development, which conditions how they think and act and how they decide.

For the sample studied, living in a rural environment is a possible cause of predisposition to gender stereotyping, which could be explained by the traditional culture in which specific social patterns are still ingrained (Thébaud & Charles, 2018). On the other hand, choosing higher education because of the motivation to work on projects significantly decreases the predisposition to gender stereotypes. In this sense, authors such as Finzel et al. (2018) advocate the work of motivation and interests to close the gender gap in STEM studies. At the same time, those who have participated in the pilot experience and who have shown an interest in STEM domains before their university studies show lower average values for gender stereotyping bias. Also, another author who advocates working on interests to reduce horizontal segregation in education is Brauner et al. (2018), who fights to eradicate the gap through initiatives. However, this contrasts with the studies finally pursued, as participants studying engineering and chemistry disciplines have higher rates of stereotyping than those studying social sciences and health sciences, as also revealed by Borsotti (2018).

On the scale of Perception and Self-perception, those in the sample who perceive the area in which they have grown up as having a medium-low or medium socio-economic and cultural level have higher rates of stereotyping. This result aligns with Bourdieu’s (1980) theory of social representation based on social capital. Continuing with the motivations, choosing to study because of the desire to travel helps to reduce stereotyping rates. Also, those who decided their studies first have lower rates than those who chose their studies second or further away. All these findings align with Brauner et al. (2018) theories. Finally, careful attention should be paid to the communication between teachers and students since students who feel judged by a male teacher are predisposed to higher rates of erratic perceptions and self-perceptions. In this direction, Papadakis (2018) and Papadakis et al. (2018) advocate prevention from the educational spheres and also the care of the hidden curriculum.

On the Science Expectations scale, for the sample, those living in urban settings, followed by those living in intermediate neighbourhoods and finally in rural settings, have higher expectations about science. Also, people in their first year and younger have higher expectations. Again, the motivation to work on projects reveals favourable results. People who have chosen their studies for this reason and those who showed interest in STEM before higher education have higher expectations of science. In this sense, authors such as Brauner et al. (2018), Lent et al. (1994), Lent & Brown (1996), and Osborne et al. (2003) are known to argue that horizontal segregation is induced by factors extrinsic to the individual, but also intrinsic, such as the individual’s educational background. For all these reasons, it is essential to work based on motivations. On the other hand, people who have no vocational training and those studying social sciences and health sciences have higher expectations than those who have vocational training or are studying engineering and chemistry disciplines.

Concerning the Attitudes scale, it is observed that the father’s education is related to the results obtained, with lower weights as the studies are more advanced. Finally, for the Interests scale, it is again observed that the parents’ studies influence. In this case, the mother’s education modulates towards higher weights. Studying the educational environment and the family and social environment is essential. Knowing which people in their environment have studied STEM fields, which people have been their references or have judged their decision, makes it possible to establish the networks of influences on choosing higher education (Lent et al., 1994; Lent & Brown, 1996).

As is evident, the factors that condition the decision on which higher education studies to pursue and the elements that intervene in the opinion on these options according to gender are different (Blickenstaff, 2005; Lent et al., 1994). Personal, academic, family and social factors are part of the construct that modulates the presence or absence of bias (Blázquez et al., 2011; Lent et al., 1994). Knowing the direction of these elements, preventive and corrective measures can be applied from a socio-educational perspective (Lent & Brown, 1996).

Finally, initiatives are a valuable mechanism to respond to the gender gap in STEM higher education from a gender equality perspective. They represent strategies through which interests and motivations can be directed and strengthened. This idea is further supported by authors consulted in the literature, such as Blickenstaff (2005), Cadaret et al. (2017), Cheryan et al. (2013), Corbett & Hill (2015), Diekman et al. (2010), and Ertl et al. (2017). Strategies for recruiting women and girls into STEM studies include mentoring, tutoring and modelling.

5 Conclusions

The study presented in this article arose from the need to design a questionnaire to study the opinion of university students on STEM studies according to gender. Being able to analyse stereotypes and biases is relevant to addressing the gender gap in educational settings, following the guidelines of some authors such as Sadler et al. (2012). This is important because horizontal segregation leads to a loss of diversity in STEM higher education, as Jacobs et al. (2017), Moote et al. (2020), and Snyder et al. (2018) point out. After a methodological procedure for designing and validating an instrument, gender stereotypes in science, technology, engineering, and mathematics can be identified. The instrument has been named Questionnaire with university students on STEM studies in Higher Education (QSTEMHE).

The results obtained from the application of the instrument favour the design of preventive and direct interventions. Some interventions can be directly applied in classrooms or campaigns to bring STEM areas closer to the student population (García Peñalvo et al., 2019).

The results of the study also reveal and confirm what was expected. The family, social and peer group environment, and educational references condition the decision on which studies to pursue. Moreover, the perception and opinion about STEM studies in higher education are also conditioned by internal elements such as motivations, interests, attitudes, self-confidence, self-efficacy, etc. Thus, the two theories on which this research is based, social cognitive career theory (Lent et al., 1994; Lent & Brown, 1996) and Bourdieu’s theory (1980), are confirmed.

Finally, some important limitations need to be considered. First, the health situation due to the pandemic caused by COVID-19 slowed down the data collection process, as classes were virtualised, and it was impossible to go to the classrooms in person. The online data collection required several reminders via e-mail; however, it is hoped that the study will be replicated in the future with a higher sample size to confirm the results obtained. Another limitation found was the reluctance of some people to answer the questionnaire, stating that they did not find it sufficiently interesting as it was a gender study. This perceived difficulty could be studied in depth in the future, given that we also consider it a result. Finally, we assume that the conditions of confinement and, therefore, the impossibility of being able to go to the classrooms in person, together with the reluctance of some people to respond, has meant that the volume of responses from men has been lower than that of responses from women. The application of the final version of the instrument on a larger sample has been planned when the restrictions due to the health crisis have been relaxed. To this end, it is hoped to extend the questionnaire to students from all Spanish universities, achieving representative groups and equality between the different gender groups. Another future line of action is to be able to extend the questionnaire to other European environments, including adapting the contextual questions for non-European foreign countries where its application can be studied.