1 Introduction

Proving has a central role in mathematical practices. Proofs are used to verify mathematical results, deduce new results from existing facts and communicate one’s reasoning to other mathematicians. In mathematics education, they can also be used to convey to students important mathematical methods and strategies (Hanna & Barbeau, 2010). For these reasons, university mathematics education generally emphasises proving. However, in many countries, proving is rarely studied in school, and even then only in specific contexts, such as geometry (see e.g., Furinghetti et al., 2011; Mingus & Grassl, 1999). Not surprisingly, students find proof one of the most difficult areas of mathematics (Moore, 1994), and the transition to proving when entering university from secondary education is particularly challenging for them (Selden, 2012). Mathematical proving has been studied widely and from many perspectives such as proof construction, reading and validation (e.g., Alcock & Weber, 2010; Hodds et al., 2014; Selden & Selden, 2009). Also, students' beliefs about the role and nature of proof have been investigated (e.g., Almeida, 2000; Conner et al., 2011), as well as pre-service teachers’ views on teaching of proof (e.g., Lesseig & Hine, 2022; Schwarz et al., 2008). Our standpoint is different from the aforementioned studies. In this study, our focus is on certain affective dimensions related to proving, each of which we consider an attitude towards proving. We chose this focus as the study of affective attributes in mathematical proving seems to be an underrepresented point of view in the research literature, although they play an important role in other mathematics education research.

There is no single definition of attitudes in the literature. In this work, we consider as attitude any persistent affective disposition or belief that may evoke emotions and guide behaviour (cf. Leder, 1987; McLeod, 1992). Studying such attitudes, including enjoyment, appreciation and self-confidence, has a long history in mathematics education research (Di Martino & Zan, 2015; Hannula, 2002; Leder, 1987; McLeod, 1992). Attitudes have been considered important predictors of study behaviour, and hence also of achievement in mathematics (e.g., Singh et al., 2002). For example, students’ beliefs in their own capabilities have a positive effect on achievement in mathematics (Pajares & Kranzler, 1995), as does considering mathematics useful and valuable (Greene et al., 1999).

It is also known that attitudes towards mathematics differ between genders. Female students have been observed to have lower self-confidence in mathematics, to suffer from more mathematics anxiety than male students and to be more poorly motivated to study mathematics than male students (Frost et al., 1994; Hembree, 1990; Middleton & Spanias, 1999). These differences are known to hinder female students’ performance and are suspected to contribute to the underrepresentation of women in science careers (Betz & Hackett, 1983; Frost et al., 1994).

In this study, we focus on proving and on four attitudinal variables that we have assumed to play a significant role in learning proving. These are self-efficacy, anxiety, appreciation and motivation. Being motivated, believing in one’s own abilities and finding value in the topic of study can assist in overcoming the mental challenges of the transition to proof. However, there might be gender differences in these attitudinal variables. We will justify our assumptions by presenting examples from existing literature on attitudes towards mathematics. We will then explore the research gap by studying relationships between attitudes, gender and performance during the transition to university mathematics and proving.

2 Aim of the study

We investigated students’ attitudes towards proving in a 14-week undergraduate mathematics course. The purpose was to explore how students’ attitudes towards proving change during such a course, and how they might influence achievement. We also investigated what role prior performance and gender play in this interaction. This initial study focuses on one teaching context, paving the way for more extensive studies.

To measure the attitudes, we combined two existing questionnaires: one measuring attitudes towards mathematics (ATMI; Tapia, 1996) and one measuring proof self-efficacy (PSE; Iannone & Inglis, 2010). The first was chosen because it was rather short and contained items related to confidence, anxiety, value, enjoyment, and motivation, which we considered important in the context of proving. The second was chosen because it focused especially on proving and might therefore strengthen the validity of the full instrument. Based on these questionnaires, we developed a new instrument that measured four attitude variables: self-efficacy, anxiety, appreciation and motivation.

First, we were interested in the attitudes that students have when they come to university. It is known that previously acquired skills are correlated with self-efficacy beliefs and anxiety towards a subject (Sitzmann & Yeo, 2013; Ma & Kishor, 1997; Ho et al., 2000; Miller & Bichsel, 2004), as well as motivation (Middleton & Spanias, 1999; Murayama et al., 2013; Singh et al., 2002). It is also known that there are gender differences in attitudes towards mathematics (e.g., Frost et al., 1994; Lahdenperä, 2018; Leder, 1995; Middleton & Spanias, 1999; Wigfield & Meece, 1988), so it is likely that these differences would be found also in attitudes towards proving. This leads to our first research question:

  • RQ1. How are students’ attitudes towards proving related to their gender and high school mathematics performance at the beginning of university studies?

Next, the focus course has been designed to smooth the transition to university mathematics, and a large part of the course is dedicated to teaching mathematical proving. It would be expected that this kind of course might have a positive effect on students’ attitudes towards proving. It is also possible that the learning environment may either increase or decrease a gender gap in attitude variables. To investigate this, we compared pre- and post-course surveys to answer the following research question:

  • RQ2. How did students’ attitudes towards proving change during the course, and what effect did gender have on the change?

Finally, we wanted to explore to what extent attitudes can have a causal effect on proving performance. To this end, we examined the students’ performance in the final project of the focus course, comparing it with the attitude variables measured at the beginning of the course and controlling for high school mathematics performance. As the final project included tasks related to proving, we aimed to answer the following question:

  • RQ3. How do differences in attitudes towards proving affect performance in proof-related tasks?

In the final question, we also distinguished between male and female students.

3 Theoretical background

3.1 Attitudes and their measurement in mathematics education

By attitudes in the context of learning, we refer to learned predispositions that evoke emotional responses towards the subject one is learning, involve beliefs and values about the subject, may guide the learner’s behaviour and are fairly persistent (cf. Leder, 1987; McLeod, 1992). For example, a learner can dislike geometry, find algebra boring, consider mathematics important or themselves good at it. Attitude is not the same as emotion, since emotions can be temporary (e.g., feeling sad after a failed exam). Attitudes are also not identical to beliefs, in that beliefs can be completely unemotional (e.g., a belief that mathematics is needed in engineering). Typically, attitudes are classified as positive or negative, and sometimes learners are described simply as “having a positive or negative attitude” towards the subject. We, however, consider attitude as a multidimensional construct.

Historically, attitudes are situated in the affective domain. The affective domain, or affect, refers to those psychological processes that reside outside cognition. Regarding affect in mathematics education, McLeod (1992) considered a tripartite model consisting of beliefs, attitudes and emotions that would play a major role in organising the affective domain. Later, DeBellis and Goldin (2006) added values to this model. Although the model is generally considered valuable and important, it has been difficult to formulate clear, separate definitions for the individual concepts (Di Martino & Zan, 2011). Attitudes, in particular, have been difficult to give a comprehensive definition (Leder, 1987). One theoretical characterisation of attitudes was offered by Hannula (2002), who described attitude as “a category of behaviour that is produced by different evaluative processes.” These evaluative processes would occur in various learning situations, and would be directed by emotions, expectations and values. On a different note, Di Martino and Zan (2010) used a grounded theory approach to students’ narratives in an attempt to distil a pragmatic definition of attitude. They produced a three-dimensional model consisting of emotional disposition such as like or dislike, vision of mathematics including value and usefulness, and perceived competence regarding beliefs about self.

Apart from theoretical investigations, there have been many attempts at measuring attitudes towards mathematics, and many kinds of self-report instruments have been developed for this purpose (e.g., Di Martino & Zan, 2015; Leder, 1985). The interest towards measurement may have stemmed from the idea that attitudes predict achievement (Aiken, 1970, as cited in Di Martino & Zan, 2015), although the meta-analysis by Ma and Kishor (1997) found the overall effect size between the two to be too small to have meaningful implications for educational practice. It can be noted that the attitudes measured with self-report instruments often include beliefs, values, emotions and other constructs that may not reflect a particular theoretical definition of attitude. In fact, Leder (1985) notes that research conducted with these instruments typically uses a definition of attitude derived from what is measured. He states that this may stem from the difficulty to match the conceptual and operational definitions. The measurements may still provide valuable information “provided the measures used are both valid and reliable,” reflecting an “appropriate conceptualisation of attitude” (Leder, 1985, p. 19).

An early and influential instrument was produced by Fennema and Sherman (1976), measuring attitudes towards mathematics of students in grades 6–12 in nine dimensions: attitude towards success, confidence, anxiety, active involvement, beliefs about usefulness, perceiving mathematics as a male domain, and parents’ and teachers’ influence. Later, Tapia and Marsh (Tapia & Marsh, 2002; Tapia, 1996) developed a questionnaire for high school and university students based on similar dimensions, aiming to make the instrument shorter and more relevant to older students. The instrument, called Attitudes toward Mathematics Inventory (ATMI) measures four dimensions: sense of security (later self-confidence), value, enjoyment, and motivation. Sense of security combines positive efficacy beliefs and feelings of anxiety, value denotes beliefs of usefulness of mathematics to oneself, enjoyment measures positive feelings and anticipations, and motivation describes willingness to choose and perform tasks involving mathematics. This shorter instrument forms the basis of measurements conducted in the current study.

3.2 Mathematical proving and attitudes

As attitudes have an important role in learning mathematics in general, one would assume that they also affect learning of proof. Stylianou et al. (2015) investigated students’ views of the meaning of proof and how these views are related to students’ attitudes, beliefs and experiences. They studied 535 students in six different universities, and found that high-performing students appreciated proofs more and had more positive beliefs about themselves as learners of proof than low-performing students. Furinghetti and Morselli (2007, 2009), on the other hand, investigated qualitatively how affect and cognition are intertwined in the proving processes of university students. Their studies imply that students’ affects can act as driving forces in the proving process and either hinder or facilitate proof construction.

Iannone and Inglis (2010) designed a questionnaire that measures students’ proof-related self-efficacy, that is, students’ belief in their capability to produce proofs. In their study, Iannone and Inglis analysed the relationship of students’ self-efficacy and their performance in proof tasks. They administered the questionnaire to 76 first-year students in a UK university, and found a positive correlation between students’ self-efficacy at producing proofs and their actual proof production performance. Also, Viholainen et al. (2019) have investigated mathematics students’ self-efficacy beliefs about proof. Their study comprised 29 Finnish and Swedish mathematics undergraduate students. Their results imply that students experienced high motivation towards proofs. At the same time, students doubted their own skills concerning proofs. Common reasons behind their low self-efficacy were problems in regulating the proving process and a fear of making mistakes. Selden and Selden (2013) have hypothesised that self-efficacy supports persistence in the process of constructing a proof, and the successes the students experience in writing proofs lead to a higher self-efficacy.

3.3 Background for attitude variables used in this study

The current study measured attitudes with a new instrument developed by combining and modifying two existing instruments (ATMI; Tapia, 1996; PSE; Iannone & Inglis, 2010). In this subsection, we characterise the four attitude variables measured by the new instrument and relate them to existing literature and related concepts. These variables are Self-efficacy, Anxiety, Appreciation and Motivation.

Self-efficacy is a student’s belief in their capability to prove or learn proofs. The general concept of self-efficacy was studied extensively by Bandura (1977). In the Fennema–Sherman instrument, the similar construct was named “confidence in learning mathematics” (Fennema & Sherman, 1976). Self-efficacy is an accurate predictor of academic achievement (e.g., Bandura & Schunk, 1981; Bartimote-Aufflick et al., 2016; Pajares & Graham, 1999; Zimmerman, 2000). It is linked to choice of activities, level of effort, persistence, emotional reactions and mathematical achievement (Bandura & Schunk, 1981; Pajares & Graham, 1999; Zimmerman, 2000). There are also implications that mathematics self-efficacy affects students’ selection of science-based majors in college (Hackett & Betz, 1989). In secondary school, Pajares and Kranzler (1995) found that self-efficacy influences performance to a degree similar to general mental ability. Also, Arens et al. (2022) have reported positive reciprocal relations between math self-efficacy and math test scores. Regarding mathematical proving, Iannone and Inglis (2010) have shown that students’ proof-related self-efficacy and performance correlate. Several studies have found gender differences in mathematics self-efficacy. In many studies, female students have been shown to have a lower self-efficacy than men (Betz & Hackett, 1983; Frost et al., 1994; Leder, 1995; Lahdenperä, 2018, Arens et al., 2022). According to Bandura’s (1997) hypothesis, the development of self-efficacy is affected by a person’s previous attainments, observing others, social persuasions, and emotional and physiological states.

Anxiety towards proving is a feeling of tension that interferes with proving activities. It can be compared to mathematics anxiety (for mathematics anxiety, see Richardson & Suinn, 1972). Numerous studies have shown a negative correlation between mathematics anxiety and student performance in all levels of education (e.g., Ma & Kishor, 1997; Ho et al., 2000; Miller & Bichsel, 2004). However, it is not clear what the causal direction between mathematics anxiety and performance is (e.g., Carey et al., 2016). Mathematics anxiety is closely linked to self-efficacy: people with low self-efficacy tend to experience high anxiety (Hoffman, 2010; Jain & Dowson, 2009; Pajares & Kranzler, 1995). A gender difference with respect to mathematics anxiety has been detected, and women are more likely to have mathematics anxiety than men (Devine et al., 2012; Hembree, 1990; Wigfield & Meece, 1988).

Appreciation of proof describes a student’s view that proving and learning about proofs either has some utility value, such as developing reasoning skills, or is intrinsically valuable. Appreciation is closely related to usefulness in the Fennema–Sherman instrument (Fennema & Sherman, 1976). It also plays an important role in the expectancy–value theory of achievement motivation (Eccles et al., 1983). Using this framework, Greene et al. (1999) found that the combination of intrinsic and utility value predicted achievement in high school mathematics, both directly and through task-specific goal setting. Studies on gender differences in value beliefs about mathematics have yielded inconsistent findings (Gaspard et al., 2015). Regarding proving, Stylianou et al. (2015) found that university students appreciated the role of proof in mathematics, with high-performing students considering proofs more important than low-performing students.

Motivation refers in this study to a mostly emotional disposition. It describes the conscious desire to engage with proving-related tasks and challenges, based on enjoyment and interest. It relates therefore mostly to intrinsic motivation (e.g., Deci & Ryan, 1985), stemming from internal feelings of enjoyment, but also to effectance motivation (White, 1959), stemming from the desire to succeed and overcome challenges. Motivation has been found to correlate with achievement in mathematics, and the effect is bidirectional: perceptions of success increase motivation (Middleton & Spanias, 1999) and motivation predicts achievement (e.g., Murayama et al., 2013; Singh et al., 2002). Middleton and Spanias (1999) pointed out in their literature review that motivation towards mathematics forms around the middle school years and is relatively stable, but can be affected by careful instructional design. They also mention a well-known gender gap, with girls being less motivated to study mathematics.

4 Design and methods

4.1 Context

The study took place in the undergraduate course Introduction to University Mathematics in a Finnish research-intensive university. In Finnish universities, students declare a specific major when they enter the university and focus on that subject from the beginning or their studies. They also choose one or more minor subjects. The students of the course Introduction to University mathematics had mathematics as a major or minor subject. Common majors besides mathematics were computer science, economics, statistics and education. The course was a first-year course and typically the first university mathematics course for students. It lasted for one semester and was worth 5 ECTS (European Credit Transfer and Accumulation System) credits. The aim of the course was to support the transition from secondary to tertiary mathematics education. The main topics of the course included sets, functions, logic and proving. Proving was the main theme of the course, and students practised different kinds of proving strategies such as proving implications and equivalence, as well as indirect proofs and mathematical induction. Many of the students in the course studied other proof-based mathematics courses along with this course. The scale of the course was large: there were 600 students who showed some activity during the course (i.e., completed at least one task during the course).

The learning environment of the focus course was student-centred. It was implemented with the Extreme Apprenticeship method (Rämö et al., 2019, 2021) which is a form of inquiry-based mathematics education (Artigue & Blomhøj, 2013; Laursen & Rasmussen, 2019). Students started studying new topics by reading the course material and completing introductory, computerised tasks that gave them instant feedback. These tasks were based on the idea of proof frameworks by Selden et al. (2018), which help understanding the relationship between the logical structure of the statement to be proved and the structure of the proof. After the introductory tasks, the students deepened their understanding by completing more demanding pen-and-paper tasks.

Every other week one of the pen-and-paper tasks was assessed by the teaching team of the course. The teaching team comprised a responsible teacher and tutors who were undergraduate or graduate students. The team had weekly meetings in which mathematical topics and pedagogy were discussed. They gave feedback to the students on their solutions to the tasks, and the students could rewrite and resubmit their solutions. The students were offered an open learning space where they could spend as much time as they wanted. In the learning space, they could work collaboratively and receive help from the teaching team in completing the tasks and reading the course material. There were also lectures which focused on motivating the topics to the students, building the big picture and linking together different concepts. The importance of sharing one’s ideas, even though they might not be correct, was emphasised to the students.

At the end of the course, students completed a final project, which was a broad assignment that focused on proving and explaining one’s mathematical ideas in good mathematical style. The students had three weeks to complete the project. They could work together and ask for help from the teaching team, but the final submission had to be their own. The project contained parts requiring explaining in one’s own words and applying concepts and ideas with mathematics they had not previously seen, so that answers could not be directly copied from other students. Key parts concerned understanding a written proof and writing one’s own proofs. The project was assessed by the teaching team, and it counted for 15% of the course grade. The rest of the grade was determined by the weekly tasks completed by the students.

4.2 Participants

The participants in this study were those students in the course Introduction to University Mathematics who responded to digital surveys during the course and gave their informed consent for using their responses for research purposes. Responses from all students were used to develop the attitude survey instrument as described in the next subsection. There were 535 students (89% of course participants) who responded to the pre-course questionnaire, and 440 students (73% of course participants) who responded to the post-course questionnaire.

In the main analysis, the number of participants was reduced for several reasons. Firstly, only those participants who responded to both pre- and post-questionnaires were included, so that the change in attitude variables could be investigated (RQ1). Secondly, only those students who gave their gender as either female or male were included, as they were the only groups large enough to be analysed statistically for differences (RQ2).

Thirdly, only those participants who had taken the advanced syllabus in mathematics in the high school final examination (matriculation examination) and provided their grade in the questionnaire were included. This was done in order to use the high school final examination grade as a control variable. Finally, only those students who submitted the final project in the focus course were included, so that the effect of attitude variables on performance could be studied (RQ3). In the end, the number of participants in the main analysis was N = 267 (120 female, 147 male; 45% of course participants).

4.3 Data collection and factor analysis

Data was collected from participants using digital self-report surveys at the beginning and at the end of the focus course, that is, in September and in December of 2019. The attitude data was collected using a questionnaire based on two existing Likert scale questionnaires: the Attitudes Toward Mathematics Inventory (ATMI; Tapia, 1996) and Proof Self-Efficacy (PSE; Iannone & Inglis, 2010). The ATMI instrument measures four attitude variables in the context of mathematics: “sense of security,” “value,” “motivation” and “enjoyment,” using 40 question items and a 5-point Likert scale. The PSE instrument measures self-efficacy in the context of proving, using 10 items and a 5-point scale.

Both original instruments were translated into Finnish by one of the authors, and the items in ATMI were reworded to refer to proving instead of mathematics. For example, “I believe studying mathematics helps me with problem solving in other areas” was changed to “I believe that studying proofs helps me with problem solving in other areas” (VAL10). Those items in ATMI that could not be transformed were discarded. All items were combined into one set of 46 questions and their order was randomised. The 5-point answer scale was retained, ranging from “I strongly disagree” to “I strongly agree.” The question set was delivered in the same form and in the same order in both the pre- and post-course surveys.

Since the questionnaire was built from two separate instruments, of which only one focuses on proving and neither had been used in the Finnish context before, we decided to perform a factor analysis to extract the attitude dimensions that the instrument was measuring and to estimate the validity of the items. The factor analysis was initially performed on the pre-course survey using R software (R Core Team, 2020) and the “psych” package (Revelle, 2020). We assumed a continuous scale for the responses and used Pearson correlation coefficients for the factor analysis, as suggested by Rhemtulla et al. (2012). To extract the factors, the maximum likelihood method was used, and since the attitudes were assumed to have non-zero correlations with each other, an oblique promax rotation was chosen (Costello & Osborne, 2005).

To find the number of factors, a scree plot together with parallel analysis of the eigenvalues were used. After making sure the suggested factors were distinct and coherent and reflected a theoretically justifiable construct, a 4-factor solution was chosen. Then, the set of items was reduced as follows. Items with a loading smaller than 0.6 were removed, resulting in a simple structure with each item loading mainly onto one factor. Some items loaded onto factors that conflicted with their original thematic contexts. For example, an item originally reflecting “enjoyment” in ATMI loaded onto the same factor as items reflecting “sense of security” and “self-efficacy.” Upon inspection, all such items were considered poorly worded or translated, and therefore removed. Lastly, the items with smallest loadings were removed to keep only six items for each factor. The final factor loadings are given in Table 8 in the Appendix. To confirm the factor structure, the same factor analysis was conducted to the post-course questionnaire. All items that were kept in the instrument loaded onto the same factors as in the pre-course questionnaire, so the factor structure was considered reliable. Finally, average scores were computed for the four factors to be used in the main analyses.

The factors were named based on their naming in the original instruments, as well as an interpretation of the items. The first factor contained all of the PSE items and some of the ATMI “sense of security” items. This factor was named Self-efficacy. The second factor contained the remaining ATMI “sense of security” items. These items reflected a negative sentiment and were negatively coded in the original instrument, so the factor was named Anxiety. The third factor contained only ATMI “value” items, so this factor was named Appreciation. The final factor contained ATMI “enjoyment” and “motivation” items. Based on the wording of the items finally chosen for the factor, the name Motivation was chosen. The origin and example items for each factor are given in Table 1. The complete item sets are included in Table 9 in the Appendix.

Table 1 The four attitude variables: Self-efficacy, Anxiety, Appreciation, and Motivation, their sources and example items in the final instrument

4.4 Main analyses

Main analyses were conducted with R software (R Core Team, 2020), with the help of the package “rstatix” (Kassambara, 2020). To measure the relationship between high school grade and the attitude variables at the beginning of the course, we performed four linear regression analyses with the four attitudes as outcome variables and high school grade as the predictor. Next, we checked that there was no significant difference in the high school grades between the two genders using a Wilcoxon signed-rank test. This allowed us to compare the attitude variables between genders directly with four separate Student’s t-tests. Normality of residuals was tested for each attitude variable with the Shapiro–Wilk test and by examining quantile–quantile plots. Good levels of normality were observed for motivation and self-efficacy. For appreciation and anxiety, normality was poorly attested, but still considered reasonable for using t-tests. The variances of attitude scores in the two groups were equal, as assessed by Levene’s test. For each outcome variable, there were individual measurements that could be considered outliers, but we estimated that they would not have a large effect on the results. The t-tests were done using the Welch approximation to calculate the effective degrees of freedom for all attitude variables. Effect size was measured with Cohen’s d (small 0.2, medium 0.5, large 0.8).

To measure the change in the attitude variables during the course, we performed four mixed-design analyses of variance (ANOVA) with gender as the between-subjects variable and time of measurement (beginning or end of course) as the within-subjects variable. The outcome variables were the four attitudes. For the end-of-course measurements, only the self-efficacy variable had properly normally distributed residuals. However, the distributions of the other three attitude variables were considered acceptable for ANOVA. Motivation was normally distributed at the beginning but not at the end of the course. The variances of the attitude scores within the two genders were found to be equal using Levene’s test. Also, the covariances were homogeneous, as per Box’s M-test. Sphericity was tested using Maunchly’s test and corrected for as needed as part of the ANOVA procedure. Effect sizes were measured with generalised eta squared (small 0.01, medium 0.06, large 0.14).

To examine the effect of the attitude variables on performance, we used a binary logistic regression analysis with achievement level in the final course project as the outcome variable. The predictor variables were gender, high school grade, and the attitude variables measured at the beginning of the course. We chose to use the pre-course variables as predictors because the post-course responses might have already been affected by the project work, thus hindering causal inferences. The outcome variable was created by dividing the possible scores (0, 5, 10 or 15) into two categories: low (0, 5, 10) and high (15). The high category corresponded to good understanding of the course material and a decent ability to write and understand mathematical proofs. The low category corresponded to less understanding and a poor or modest ability to deal with proofs. With this division, both high and low categories had a similar number of students.

Throughout the analyses, we used the traditional alpha level of 0.05 for statistical significance. However, as this was exploratory research, we wrote out all p-values and considered them potentially informing up to 0.1. For the same reason, we did not apply any statistical correction for multiple testing, in order not to overlook any promising candidates for further research (Streiner & Norman, 2011). However, we strove to avoid data dredging and making strong claims about results that were close to the chosen alpha level, on either side.

5 Results

5.1 Descriptive statistics

Table 2 gives the means, standard deviations and Cronbach’s alpha coefficients of the attitude variables, as well as Pearson’s correlation coefficients between the variables. The alpha coefficient measures the internal consistency of the variable, or the interrelatedness of the items comprising the variable. Information for gender (female/male), high school final examination grade, and performance in the final project (low/high) is also included in the table.

Table 2 Means, standard deviations, Cronbach’s alpha coefficients, and Pearson’s correlation coefficients of the study variables

Before the main analysis, we observed from the means alone that Self-efficacy seemed to have increased during the course (from 2.70 to 3.01) and Anxiety diminished (from 2.43 to 2.25). Appreciation remained high throughout the course (3.87 pre-course, 3.92 post-course). Motivation has increased marginally (from 3.01 to 3.09). Anxiety had the largest standard deviation in both surveys (0.93 pre-course, 0.92 post-course), and Motivation had a similar standard deviation in the post-course survey (0.92). Cronbach’s alpha coefficients were all above 0.88, which indicates strong internal consistency of the attitude variables.

Correlations between the attitude variables were all non-zero with p < 0.01. The strongest positive correlations were between Motivation and Self-efficacy (0.57 pre-course, 0.66 post-course), and between Motivation and Appreciation (0.59 pre-course, 0.56 post-course). The strongest negative correlations were between Anxiety and Self-efficacy (− 0.55 pre-course, − 0.64 post-course), and between Anxiety and Motivation (− 0.55 pre-course, − 0.62 post-course).

Based on the correlation coefficients, high school final examination grade correlated with all other attitude variables apart from Appreciation, having positive correlation with Self-efficacy and Motivation, and negative correlation with Anxiety. Gender correlated with Self-efficacy and Motivation, with male students tending to have larger values for both attitudes. Performance in the final project correlated positively with high school final examination grade, as well as with the same attitude variables as the high school examination grade.

5.2 Attitudes towards mathematical proof at the beginning of the course

To study the effect of previous performance on the attitude variables, a linear relationship was confirmed by regressing the four attitudes on the high school final examination grades. The results are given in Table 3. We found that high school grades had a positive linear relationship with Motivation and Self-efficacy, and a negative relationship with Anxiety. The relationship with Appreciation was not significant (p > 0.05).

Table 3 Summary of linear regressions for high school grades predicting attitudes towards proving

After confirming the linear relationship between high school grades and the attitude variables, a non-parametric two-samples Wilcoxon rank test was done to see if there were any differences in high school grades between genders. No differences were found (p = 0.93). This test was chosen, as the grades were not normally distributed within either gender, according to the Shapiro–Wilk normality test (p < 0.001 in both groups).

Four t-tests were then performed to compare the effect of gender on attitudes towards proving. As presented in Table 4, male students exhibited significantly higher Self-efficacy (t(253.0) =  − 2.76, p = 0.006, Welch correction) than female students. Male students also had significantly higher Motivation (t(252.0) =  − 2.14, p = 0.034, Welch correction) than female students. Differences in Appreciation and Anxiety were non-significant. Effect sizes were small.

Table 4 Summary of t-test results for the effect of gender on attitudes towards proving

5.3 Changes in attitudes during the course

To assess the change in the attitude variables between the beginning and the end of the course among both genders, a mixed-design ANOVA was used, with time as a within-subjects variable and gender as a between-subjects variable. Table 5 shows the means and standard deviations of the attitude variables at both time points broken down by gender.

Table 5 Means and standard deviations of the attitude variables at the beginning and at the end of the course, broken down by gender

During the course, Self-efficacy increased significantly (large effect) and Anxiety decreased significantly (medium effect), as summarised in Table 6. The increase in Motivation was borderline significant (p = 0.051, small effect), whereas the change in Appreciation was non-significant. There was no significant interaction between time and gender, meaning that the attitude variables changed similarly for female and male students.

Table 6 Summary for the mixed-design ANOVA comparing the attitudes towards proving at the beginning and at the end of the course and between genders

5.4 Achievement predicted by prior attitudes, high school grade and gender

To analyse to what extent 1) gender, 2) high school grade and 3) attitudes at the beginning of the course predicted achievement in the final project of the course, a logistic regression was conducted. In the logistic regression, the predicted outcome was defined as “high” achievement (i.e., score of 15 on a discrete scale of 0, 5, 10, 15) in the final project. Table 7 presents a summary of the results.

Table 7 Binary logistic regression predicting high achievement in the final project of the course in terms of gender, high school grade, and attitudes towards proving at the beginning of the course

High school grade and Motivation were the only significant predictors of achievement, when measured holding all other variables constant. The odds of getting a high score on the final project were increased by a factor of 1.5 per unit increase in high school grade, and a factor of 1.6 per unit increase in the Motivation score. The effects of gender, Self-efficacy, Anxiety and Appreciation on achievement were non-significant. The average scores in the final project were 10.6 for women and 11.0 for men.

6 Discussion

Values and attitudes are important drivers of studying and learning. In this study, we focused on beginning university students’ attitudes towards proving, as proof-based reasoning is generally recognised as a particular challenge in the transition to university mathematics. Our results suggest that some attitude variables may indeed have an effect on performance and that they may themselves be affected by teaching or the learning environment more generally. We also found that attitudes towards proving are similar to attitudes towards mathematics in that they show a clear gender difference at the time of entering university.

6.1 Defining attitudes through measurement

Instead of deriving the attitude variables used in this study from a particular theory of attitude, we defined them by combining and modifying existing attitude instruments that we considered useful for this study. The resulting four attitude variables: Self-efficacy, Anxiety, Appreciation and Motivation, are all related to well-known mathematics affects. Self-efficacy and Appreciation can be seen as cognitive or metacognitive dispositions, with Self-efficacy directed to oneself (“I believe I can”) and Appreciation to the object of learning (“I think it is useful”). Anxiety and Motivation, on the other hand, are more emotional, with Anxiety corresponding to almost visceral reactions (“I feel repulsed”) and Motivation indicating desire to seek out proving-related situations in the future (“I find it interesting”).

The new instrument performed relatively well overall, with clear distinctions between the four attitude dimensions and good internal consistency of each dimension. Self-efficacy and Motivation were approximately normally distributed. Appreciation was cut off from the high end (ceiling effect), having so many positive responses, and Anxiety was somewhat cut off from the low end (floor effect), with a less clear peak. These shortcomings should be addressed if the instrument were to be developed further.

6.2 Attitudes were connected with prior performance and gender

At the beginning of the course, students varied in their Self-efficacy beliefs, Anxiety towards proving and Motivation. Their high school grades correlated positively with Self-efficacy and Motivation, and negatively with Anxiety. So, even though Finnish school mathematics does not include a lot of proofs, high school performance is linked with students’ attitudes towards proving.

Our results align with Bandura’s (1997) hypothesis of self-efficacy being affected by a person’s prior experiences. They also corroborate the findings by Stylianou et al. (2015) who showed that high-performing students were more likely to hold more positive beliefs of themselves as learners of proof than low-performing students. Furthermore, our results are in line with previous studies that have shown that performance in mathematics is connected with self-efficacy (Hackett & Betz, 1989; Pajares & Graham, 1999), anxiety (Ma & Kishor, 1997; Ho et al., 2000; Miller & Bichsel, 2004) and motivation towards mathematics (Middleton & Spanias, 1999; Murayama et al., 2013; Singh et al., 2002). On the other hand, Appreciation of proofs did not vary a lot in the sample. Our result differs from the study by Stylianou et al. (2015), in which high-performing students appreciated proof more than low-performing students.

When we measured gender differences in attitude variables at the beginning of the course, we found that male students reported significantly higher Self-efficacy and Motivation than female students. The gender difference in Self-efficacy is in line with previous studies which have reported women having lower self-efficacy in mathematics than men (Betz & Hackett, 1983; Frost et al., 1994; Leder, 1995; Lahdenperä, 2018; Arens et al., 2022). It seems that the same gender gap exists with regard to proving. Similarly, Motivation towards studying mathematics is known to decrease among girls during middle school (e.g., Middleton & Spanias, 1999), and there may be a similar reason behind our results on proving.

We could not detect a significant gender difference in Appreciation or Anxiety towards proving. Many studies have found such a gender difference in mathematics anxiety (Hembree, 1990; Wigfield & Meece, 1988; Devine et al., 2012). In our study, there was an absolute difference of 0.2 standard deviation units in both Appreciation and Anxiety at the beginning of the course, males having a higher value in Appreciation and lower in Anxiety. It may be that our instrument was simply not sensitive enough to capture these differences as statistically significant with the current sample size. However, it may also be that students had not had a chance to experience proving at this stage, and therefore female students have not yet formed a negative response to it.

6.3 Interplay between learning environment and attitudes

During the course, students’ Self-efficacy increased and Anxiety decreased on average. Also Motivation and Appreciation increased, but these changes were not statistically significant. As we did not have a control group and many students were taking other proof-based courses at the same time, we cannot conclude that these effects were due to this particular course. However, as proving is an activity rarely considered outside mathematics and the focus course was designed as an introductory course on proving, we are willing to hypothesise that at least part of the changes in these attitude variables were due to the learning environment in this course.

In the context of the focus course, there are several possible reasons for the increase in Self-efficacy that are in line with Bandura’s (1997) model, which identifies prior attainments, vicarious experience and social persuasions among sources of self-efficacy. Learning proof was scaffolded with proof frameworks and tasks of increasing difficulty. It may be that these allowed students to experience early successes, which affected their self-efficacy positively (see Selden & Selden, 2013). The students received support and encouraging feedback from the teaching team in a collaborative learning space. They also gave written peer feedback that allowed them to see their peers’ work and compare their own capabilities to those of other students. Finally, the lecturer emphasised that making mistakes is part of the process of proving. In addition to increasing Self-efficacy, the same elements of the learning environment may also have decreased students’ Anxiety (see Supekar et al., 2015).

In the final project, a high level of Motivation in the beginning of the course predicted good performance, after controlling for prior skills (high school grade) and while keeping the other attitudes fixed. We did not find a similar effect with the other three attitude variables, although in many previous studies both self-efficacy (Pajares & Graham, 1999) and anxiety (e.g., Ma & Kishor, 1997; Ho et al., 2000; Miller & Bichsel, 2004) have correlated with achievement. This may result from the fact that we controlled for prior performance in our analysis. For example, it may be that the students with high Self-efficacy did well in the final project, but if they also had strong skills at the beginning of the course, this effect was partially diminished in the analysis. We also acknowledge the high correlation between our attitude variables which may raise a problem of multicollinearity.

We conclude that Motivation was unique in that it helped students to surpass their prior skill level in the final project. As they had a relatively long time to work on the project by themselves and with support from the teaching team, high Motivation would probably help students work longer and harder, thereby achieving better results.

6.4 Implications for research and practice

Our study has indicated that attitudes towards proving can have an additional effect on performance in certain kinds of proving tasks. Also, students’ attitudes towards proving can improve during their studies. These results make attitudes worthwhile to study and also to consider when planning teaching and assessment. The instrument developed in this study provides a quantitative tool for studying students’ attitudes, which can be used and developed further in future studies. We suggest developing this tool towards one based on a comprehensive theoretical grounding, such as that of Di Martino and Zan (2010).

We detected an increase in students’ Self-efficacy and a decrease in their Anxiety. In light of earlier literature, we hypothesise that collaborative learning environments in which the students interact with their peers and teachers has a positive effect on students’ attitudes. Also, making the learning environment safe for the students by, for example, discussing factors that may cause anxiety and encouraging students to share their unfinished ideas, likely supports a beneficial development of attitudes. In the future, qualitative studies such as interviews and observations could reveal more about the reasons behind the changes in students’ attitudes.

Our results show that, like in many areas of mathematics, there is a gender gap in students’ attitudes towards proving: female students suffer from more detrimental dispositions towards proving. Teachers should be aware of this inequity and work actively towards changing the systemic deficiencies causing it. Based on prior research on gender gaps in students’ attitudes towards mathematics, student-centred learning environments (Lahdenperä, 2018; Laursen et al., 2014) can help reduce the difference between females and males. However, when applying student-centred methods attention needs to be paid to the ways in which oppressive cultural narratives about women in mathematics affect women’s ability to participate in the classroom (see Reinholz et al., 2022).