The Properties of Powers: Didactic Contract and Gender Gap

National and international large-scale assessments of mathematics show that, in most nations, males achieve better results than females and Italy is one of the countries with a greater gap. Many research studies in mathematics education have analysed this issue, using both quantitative and qualitative methods to understand the sources and characteristics of this gap. This study focuses on a specific Grade 10 task that requires algebraic manipulations of powers with the same base. Item-level analysis enables the study of gender differences on specific content, before using the lenses of mathematics education theories to interpret macro-phenomena emergeing from standardized assessment results. The quantitative analysis, carried out using the Rasch statistical model, highlights a gender gap in favour of males in this task and, furthermore, a variance in choices of incorrect options between males and females; the interviews conducted provide a key to understanding this phenomenon in terms of didactic contract.

use this perspective to interpret the phenomena highlighted by the quantitative analysis and, in particular, we will investigate the causes of this phenomena using the didactic contract construct defined by Brousseau (Brousseau, 1997; Education Committee of the EMS [EMS-EC], 2012) as the main interpretative key. In line with Brousseau's definition (1980), we consider the didactic contract as the pupil's expectations of teacher behaviour and the teacher's expectations of student behaviour; this construct was introduced to interpret possible reasons for the specific failure in mathematics of some students and it seems very appropriate for interpreting the phenomena we investigated.
The didactic contract is one of the solid findings of mathematics education (EMS-EC, 2012) and it has significant international relevance (Sarrazy, 1995); indeed, this construct is useful for interpreting several classroom situations and, thus, several difficulties encountered by students (Brousseau, 1997).
The socio-cognitive dimension is fundamental for the definition and evolution of this notion (Sarrazy, 1995): the didactic contract regulates classroom activities, influencing the behaviour of the teacher and students, also regarding acquired knowledge. Indeed, the didactic contract has a strong influence on students' learning processes, and the effect of this in mathematics activities cannot be underestimated. The didactic contract imposes rules of behaviour but these rules and norms are often implicit, meaning that they can in fact become obstacles in the teaching/learning processes. Although it is possible that these rules come from didactic practices and may also be useful for students in some situations, difficulties connected to the didactic contract are often due to the application of these rules in a new context where they are not appropriate.
In several studies, the didactic contract was used to interpret phenomena in primary school but some studies have highlighted that it emerges also in higher scholastic grades (D'Amore, Fandiño, Marazzani & Sarrazy, 2010;De Vleeschouwer & Gueudet, 2011) and that it is possible to observe its existence and effects not only through qualitative interviews but also in large-scale assessment processes (Ferretti, & Bolondi, 2019).
In this paper, we will use the didactic contract as a theoretical lens through which we interpret students' answers to a specific task taken from a large-scale assessment test at grade 10. The quantitative analysis of the task allows us to analyse students' answers also as a function of their ability across the whole test, focusing on differences between males and females. The didactic contract is here used to interpret the phenomena highlighted by the quantitative analysis, and the interviews confirm the accuracy of this interpretation. In detail, we analysed the students' approach to solving the proposed task, investigating the presence of those behaviours attributable to contractual rules which are created by the students. These implicit didactic contract rules are not imposed by teachers, but are created by students over time in response to recurrences that have generated standard problem models. These norms are so firmly established that, as already highlighted in the literature (Ferretti, & Bolondi, 2019), they also emerge during standardized assessment situations.

Gender Gap in Mathematics
The strong disparity between mathematical results of males and females (in favour of males), is addressed in the international literature and highlighted by standardized assessment results (Mullis, Martin, Foy, & Hooper, 2016). International mathematics standardized assessment systems, OECD-PISA and IEA-TIMSS, confirm the trend both in primary and secondary schools (OECD, 2016a).
Many studies in recent years (i.e. Leder & Forgasz, 2008;Forgasz et al., 2010;Leder & Forgasz, 2008) have focused on investigating the causes behind the better results of male students in mathematics; this growing interest in the subject can also be linked to widespread evidence emerging in standardized assessment tests. And it is thanks to the data and comparisons of results in different countries that it is possible to investigate the characteristics and causes of the gender gap. From the data of international standardized assessment tests, it emerges that the gap between test results of males and females is not uniform across all the school systems investigated; in most countries, the gap is in favour of males, but there are exceptions where it is in favour of women (OECD, 2016a). This fact supports the theory that causes of a biological and physiological nature, considered in some studies (e.g. Baron-Cohen & Wheelwright, 2004), do not appear to be a predominant factor in the emergence of the gap.
If the causes were predominantly biological in nature, the differences should be more or less the same in all countries, a hypothesis which is contradicted by the results of the OECD-PISA and IEA-TIMSS tests (Hill, Corbett, & Rose, 2010). The non-homogeneity in the distribution of gender gap also supports the numerous studies that uphold the strong impact of social and cultural factors (e.g. Guiso, Monte, Sapienza & Zingales, 2008). For example, several studies have shown that the gender gap in mathematics is closely linked to the emancipation of women in society; in societies where gender equality is reached, this gap tends to disappear (i.e. Guiso, Monte, Sapienza, & Zingales, 2008;González de San Román & De La Rica, 2012).
The performance gap is also strongly influenced by the beliefs and attitudes of teachers, parents and students towards mathematics, which are often dictated also by gender stereotypes (eg Jacobs & Bleeker, 2004;Tomasetto, 2013) and by metacognitive factors closely linked to mathematics (Herbert & Stipek, 2005;Hill et al., 2010;OECD, 2015). A lower level of self-efficacy and self-confidence in mathematics (OECD, 2015) may lead girls to be more afraid of making mistakes, thus preferring well-known resolution strategies already adopted in the classroom rather than trying out new strategies in solving problems (e.g. Bell & Norwood, 2007). Finally, factors closely related to the classroom context such as the curriculum, teaching and evaluation practices may have a different impact on males and females (Leder, 1992;Leder & Forgasz, 2008). Even micro-social factors that arise in the classroom context can lie at the basis of gender differences in mathematics: a greater bond between girls with teaching practices and the teacher can, for example, lead girls to be more influenced by misconceptions and the didactic contract (Bolondi, Cascella, & Giberti, 2017).
Italy displays one of the most marked gender gaps (Mullis et al., 2016;OECD, 2016b;Gilberti, 2019). The clear mathematical gap between males and females in Italy is also confirmed in the various school grades by the results of the national standardized assessment INVALSI surveys.

Research Questions
In this paper we employed the assumption that didactic contract can have a stronger influence on females than on males, focusing particularly on its effect connected to a specific mathematics task.
The links between didactic contract and gender differences are varied and support this hypothesis. Firstly, the relevance of social and cultural factors in determining gender difference in mathematics can be transferred also at a classroom level, where micro social factors (strictly related to the milieu habits, classroom practices and teacher-student relationship) have a different influence on male and female behaviour, both in classroom work and in assessment situations (e.g. Bell & Norwood, 2007;Bolondi et al., 2018). Moreover, the relationship with the teacher and with the discipline itself is another important connection between these two phenomena: the literature shows the importance of metacognitive factors such as math anxiety and math self-efficacy (e.g. Hill et al., 2010) as well as the relevance of beliefs of teachers on students' performances and, also in this case, we observe that these factors disadvantage females more than males in learning mathematics. Then, phenomena such as the didactic contract (emerging within classroom habits and related to implicit and explicit rules set by the teacher) have a stronger influence on females, and may constitute an obstacle to learning processes. This is also observed in this specific task in which the gender gap in favour of boys might be explained by the influence of didactic contract but also to the girls' reluctance to eschew didactic practices and rules established by classroom habits (Bolondi et al., 2018;Giberti, & Ferretti, 2019).
Finally, in recent studies we observed that the didactic contract is particularly influential on students with medium and medium-high ability levels Giberti, & Ferretti, 2019); this is in line with previous literature and the definition itself of didactic contract i.e. arising from rules and norms of the classroom and teacher habits, influencing mostly those students who follow the teacher's lessons and his/her instructions but are not able to discern correctly when a rule should be applied or not. On the other hand, literature on gender differences highlight that the gap is greater for students with high ability levels; furthermore, the gender gap is almost null at the beginning of schooling but increases over the years (OECD, 2016b), thus confirming the relevance of scholastic factors.
Our assumption, also based on previous research (Giberti, & Ferretti, 2019), is that a lower level of self-efficacy and self-concept in mathematics could lead girls to be more inclined to take "safe refuge" in didactic routines and practices, and be more prone to follow what the teacher says (or does not say, in the case of implicit rules). The expectations which, explicitly or not, arise from the classroom environment constitute this "safe refuge" but can also become a limitation on students' autonomy and an obstacle to learning, as other studies on the didactic contract show (Ferretti, 2020).
In this work, we investigate gender differences in answering a specific task involving an operation with powers, interpreting this phenomena in terms of didactic contract: girls, when having to operate with powers of the same base and high exponents, are more driven by didactic practices to use power properties even though it is not appropriate for the case at hand. In the first part of our analysis, the quantitative tools highlight gender differences (in favour of males) in a specific task belonging to a grade 10 INVALSI math test. In the second part. we will interpret statistical evidence via a qualitative analysis based on interviews. The first research questions are related to the quantitative analysis: (1) does the quantitative analysis performed via the Rasch Model reveal a gender gap in terms of ability level of the students?
(2) can the gender gap observed regarding the correct answer be due to differences in the selection of wrong options for males and females? In other words: Are males and females attracted by the same incorrect answers?
At first glance, the postulated interpretation of the task highlights that difficulties of students nationwide might be related to didactic contract. The interviews performed in the second part of our work are used to investigate if, and how, the behaviour of students in this task has been influenced by didactic contract, before answering the following questions: 3 Do interviews confirm that the difficulties encountered by students in this task are due to the didactic contract? 4 Are reasons and type of error linked to a specific difficulty and thus to a specific wrong option, which differs between males and females?

Research Methods
In this research we adopted a mixed method (Johnson & Onwuegbuzie, 2004) approach based on the integration of two different analyses. The first part of our research is based on the results of standardized assessment and consists in a quantitative analysis. Using the Rasch model and Differential Item Functioning (DIF) analysis on a large national sample of students (more than 40,000 students) we observe macro-phenomena on a specific item of the test, which can be linked to solid findings of mathematics education theories (EMS-EC, 2012). In particular, the quantitative analysis allows us to highlight students' behaviour and students' answers as indicative of their ability on the whole test and, furthermore, it gives us the possibility also to characterize gender differences on the basis of student ability. In this way we can make conjectures concerning the specific difficulties encountered by students in a task and the cognitive processes linked to each possible answer, also considering differences between male and female responses. These conjectures are then investigated via a qualitative analysis based on interviews of students of the same grade, with the aim of understanding the cognitive processes adopted by the students and then the reasons for the emergence of gender gap in this specific task.
The structure of our mixed methodology, in accordance with Johnson and Onwuegbuzie (2004), is thus QUAN→QUAL, with the integration of two different steps of equal importance but with different goals: highlighting the macro-phenomena on a large sample representative of the population, and then interpreting this phenomena by interviewing students.

Quantitative Analysis
The task analysed in this paper belongs to an INVALSI test administered in 2012 at grade 10. The test comprised 45 items covering the entire content domain (Space and Shapes, Data and Uncertainty, Numbers, Relation and Functions) and was constructed and analysed by the INVALSI team. We consider the results of the INVALSI sample, which is composed of 41,812 students selected to be representative of the whole population nationwide. Moreover, using INVALSI sample data guarantees the regularity of the administration process and data entry as it is performed by INVALSI experts. A more detailed description of the construction and representativeness of the INVALSI sample is reported in the technical report (INVALSI, 2012) and based on the work of the INVALSI statistical team. Also, the analysis of the validity and internal consistency of the whole grade 10 INVALSI test in 2012 is reported in the technical report; particularly, the Cronbach's Alpha is very acceptable, equal to 0.89.
Our quantitative analysis is based on the Rasch model (Rasch, 1960) which is a logistic model often used to analyse tests and adopted also by the INVALSI statistical team for the main analysis of results. The Rasch model belongs to Item Response Theory and is useful for analysis both at test and item level as it allows a joint estimate of the difficulty of the items and of the ability of each student by placing them on the same scale from −4 to +4. Student ability is measured across the whole test and is directly comparable with item difficulty: for instance, a student with an ability parameter equal to 0.8 has a 50% probability of answering correctly an item with a difficulty index of 0.8, while the same student has a probability higher than 50% of answering correctly an item with a difficulty parameter lower than 0.8. The Item Characteristic Curve (ICC), output of the Rasch Model, expresses the probability of responding correctly to a specific item depending on the difficulty of the item itself and the ability of the respondent, measured across the entire test. The ICC-plot (see, for example, Fig. 1 and 2) reports students' ability on the x-axes and the probability of choosing the correct answer on the y-axes. Together with the ICC, we can also represent empirical data relative to the same item, reporting for each The Properties of Powers: Didactic Contract and Gender Gap decile of the population (grouped on the basis of their ability across the whole test) the percentage of students who choose a specific answer to the task (correct and wrong options). These graphs are named Distractor Plots and are particularly important in analysing the fit of the item with the model: comparison of the ICC and the empirical line of the correct answer gives information regarding the way the theoretical curve of the item represents the empirical data. Moreover, distractor plots are also useful for analysing the trend of incorrect answers as a function of students' ability (observing the empirical lines of the incorrect answers).
In this paper we used JMetrik 4.0 software to perform a Rasch analysis of the INVALSI test administered at grade 10 in 2012. We based our quantitative analysis on this model because it allows us to: & place all the students of the sample on the same scale according to their ability measured over the entire test & produce the distractor plot of the task which is studied in this paper & produce the distractor plot dividing males and females.
The plot results of this procedure allow the comparison of male and female answers as a function of their ability measured over the entire test. The x-axes of the plot report the Rasch parameter for students' ability (measured using the Rasch model for the whole sample, to place male and females on the same ability scale). The sample was then divided by gender and, the two groups were further divided into deciles according to their ability. For each male and female decile, the plot reports the percentage of students choosing each option, which gives us the opportunity to observe the empirical trends of male and female answers with a direct comparison.
In particular, this last point is crucial in our analysis because it allows us to compare answers given by males and females on the basis of their ability and highlight differences not only in choosing the correct answer, but also in choosing incorrect options.
Furthermore, using the same software, we also performed a Differential Item Functioning (DIF) analysis in order to compare the performances of the two subgroups (focal group: males; reference group: females) on each item of the test: Differential item functioning (DIF) occurs when one group of examinees has a different expected item score than comparable examinees from another group. It indicates that an item is measuring something beyond the intended construct and is contributing to construct irrelevant variance. (Meyer, 2014, p. 69) INVALSI tests are built to be DIF-free, because the presence of a significant DIF is a threat to the validity of the test and DIF items interfere with the quality of the measured trait.
We performed the Cochran-Mantel-Haenszel (CMH) statistic for testing statistical significance provided by jMetrik, This procedure tests the null hypothesis: item scores are conditionally independent of group membership (in our case, gender). The CMH procedure, might be influenced by the sample size (Meyer, 2014), so the common-odds ratio is used to describe practical significance: no DIF is detected if this index is 1, while values larger/smaller than 1 indicate an item favouring the reference/focal group respectively. Based on CMH and common odds ratio results, all the items have been classified, following the criteria suggested by Zwick and Ercikan (1989) 1 : "A" if the DIF is negligible, "B" if the DIF is moderate and "C" if they exhibit a high value of DIF and thus potential problems for the validity of the test.
We performed this analysis for the whole test, considering both the row score and the Rasch parameter (theta) as matching variables; we thus compared male (reference) and female (focal) answers for each item of the test.

Qualitative Analysis
To confirm the quantitative analysis, we interviewed students of the same age as those who had tackled the same INVALSI task years earlier. Already in 1980, Ericsson and Simon (1980) supported the need to move beyond a quantitative approach in order to interpret the situations and so we performed a qualitative analysis from a cognitive and meta-cognitive perspective. We interviewed 18 students attending 6 classes of upper secondary school (with different orientations): 7 Technical Institute students and 11 Liceo (lyceum) students. Teachers were asked to indicate two or three students per class 1 The rules for this classification, in terms of the common odds ratio, are as follows (Meyer, 2014;Zwick & Ercikan, 1989): & "A" items have (a) a CMH p value greater than 0.05, or (b) the common odds ratio is strictly between 0.65 and 1.53.
& "B" items are neither "A" nor "C" items. & "C" items have (a) a common odds ratio less than 0.53, and the upper bound of the 95% confidence interval for the common odds ratio is less than 0.65, or (b) a common odds ratio greater than 1.89 and the lower bound of the 95% confidence interval is greater than 1.53.

Fig. 2 Distractor Plot by gender
The Properties of Powers: Didactic Contract and Gender Gap with an "average" performance compared to the class trend. Task-based structured interviews were conducted (Goldin, 2000), which focused on how the pupils tackled the mathematical activity. The interviews took place individually and were recorded and transcribed with the students' consent -it seemed that the presence of the voice recorder did not affect the conducting of the interview in any way. The structure of the interviews was planned in advance by the three interviewers; the students were given a questionnaire and asked to complete it. At the beginning of the interview, the students were asked to enter a survey ID name and were asked to carry out the tasks (Attachment 1) either aloud or in silence, explaining their strategies later. Most of the students opted for the second option, first carrying out all the tasks and then explaining aloud the reasoning and strategies implemented. Investigating the effects of didactic contract, in line with the theoretical framework (Brousseau, 1988), we tried to create a relaxed environment in all interviews. This made it possible to create conditions suitable for research; in fact, many aspects of the didactic contract are inherent to the difficulty of explicit inner processes and, very often, external factors are decisive. Moreover, as Goldin (2000) recommends, we tried to take precautions with regard to possible unforeseen events and, being willing to temporarily alter the planned structures of the interviews, we identified situations that led to useful observations for the purposes of research. As the author emphasizes, at the time of the interview, attention was focused on the process that the subject undertakes to arrive at the answer rather than the correctness or otherwise of the mathematical task (ibidem); and it is precisely this fact that creates the possibility of investigating important issues in greater depth than allowed by other experimental means. Specifically, in the context of this research, this has been crucial in eliciting feelings and beliefs typical of the cases between students who are explicit or implicit in their strategies.

Quantitative Results
The results of the first part of the quantitative analysis reported in the table below ( Fig. 1) derive from the statistical analysis made by INVALSI using the software Acer Conquest, which merges the results of the Rasch Analysis (Item delta and Distractor Plots) with the main index of Classical Test Theory (Discrimination and Weighted MNSQ).
The item shows acceptable psychometric features (Barbaranelli & Natali, 2011;INVALSI, 2012). The percentage of correct answers is 35%, which is low if we consider the fact that this is a multiple-choice item with four options. The difficulty of the item is also confirmed by the item delta which is equal to 0.76. Moreover, we observe a strong difference between males and females in choosing the correct answer: 38% of males answered correctly as opposed to 31% of females. Option B was the most attractive to both males and females and there was no significant gender difference for option A. A large part of the gap observed is due to option D, which was chosen by 14% of males as opposed to 19% of females. Almost all the students answered this question and the percentage of missing answers was only 3%.
The distractor plot allows us to compare the students' choices on the basis of their ability level measured over the entire test (on the x-axes): the dotted lines represent the empirical trend of each possible answer (percentage of students for each decile who chose a specific option), while the solid line is the ICC output of the Rasch model. The comparison of the empirical line of the correct answer with the ICC curve highlights a little over-discrimination: the model overestimates the probability of choosing the correct answer for students with medium-low ability levels and underestimates it for higher ability levels.
Furthermore, it is interesting to observe the trend of the incorrect options: they are all constant for low and medium ability levels, while they show a decreasing trend only from medium ability levels upwards. This means that students with low and mediumlow ability display the same behaviour when tackling this task and their choices are similar in terms of percentage. Our elaboration of INVALSI data using JMetrik 4.0 software allows us to make a comparison of the distractor plot of males' answers and that of females, and compare them on the same ability scale created with the Rasch model (Fig. 1). It is interesting to note that the differences between males and females observed in terms of percentage are not constant in all ability levels. First, the trend of the correct answer is almost the same for all ability levels; we observe a difference in favour of males only for low ability levels. Indeed, we observe that the difference in the correct answer is almost certainly due to the fact that males reach higher ability levels, and the higher decile of males and females are at a different ability level. This factor also influences the trend of the other options: the male lines reach higher levels, while female deciles are more concentrated on medium values of the ability trait. Our analysis shows also that there is no significant difference in option A, while option B is preferred by males and option D is preferred by females, especially for medium and low ability levels.
Finally, we present the results of DIF analysis (Table 1). INVALSI tests are constructed to be DIF-free because the emergence of a significant DIF value in many items of the test would compromise the validity of the test itself. Table 1 demonstrates that all the items except three are classified as DIF-free (class A). There are only three items demonstrating moderate DIF, two in favour of females (B-) and one in favour of males (B+). Finally, as expected, no item highlights a large value of DIF (class C) -the presence of this kind of item would have affected the validity of the test.
The item used in our analysis (D21) is classified as A and thus presents a negligible DIF value. This is in line with our expectation because the comparison of distractor plots for males and females showed that the difference in percentages of correct answers was almost always due to the fact that males reach higher ability levels, as opposed to a different trend in the empirical curves. The percentage of males and females selecting the correct answer is the same for a given ability level, and thus no DIF was detected. The main results of this research are not based on differences in the correct answer as opposed to differences in choosing the other (incorrect) options.

Interview Analysis
To investigate the features of the phenomena from a qualitative point of view, students were given a questionnaire to complete. This included task D21 from the INVALSI 2011 test, Grade 10. In line with the theoretical framework and in order to create conditions as far removed as possible from the didactic contract, questions were chosen that involve basic skills as it is assumed that high school students solve them without difficulty, thus putting them at ease. Students who correctly answered task 3 (the INVALSI task, object of the investigation) were told about the general failure nationwide among their peers and then asked to suggest reasons for the phenomenon. Meanwhile, with the students who provided the wrong answer, we tried to explore in depth the reasons for the difficulties encountered. Below are some significant extracts from interviews, in which some features of the phenomena under investigation are highlighted. Out of the 18 students interviewed, regarding task 3, 8 students provided the correct answer and 10 gave the wrong answer. The first two tasks of the questionnaire were actually simple -in fact, all the students interviewed provided the correct answer. Although it is not possible to generalize on such a small sample, we observed that the performances are in line with the national sample while another relevant fact is that all the females interviewed who made a mistake chose option D. Only the extracts deemed most significant from the point of view of the research hypotheses are shown below (ST = Student, F = female, M = male). Two students (ST_F_01 and ST_F_02) supplied the wrong answer only to task 3 of the questionnaire, namely to task 21 of Grade 10 INVALSI (giving the correct answer to the three other questionnaire tasks); they are both female and both chose option D. ST_01 first performed all the tasks in silence and autonomously, then the researcher asked her to explain aloud the procedure followed for each task. The following section refers to the INVALSI task.
& R: To solve this task, what reasoning did you follow? & ST_F_01: To be honest, since I do not consider myself smart, I do not feel very smart ... I think I made a mistake because ...well, first of all, we cannot apply the properties of the powers because there is no multiplication, so we cannot add the powers, so we cannot have 75.
ST_F_02 spoke aloud while solving tasks. The following sentences show her reasoning as explained during the INVALSI task solution. Interesting observations also arose when the interviewer asked students about the possible reasons for students' difficulty nationwide. Indeed, some of them referred explicitly to what they usually do during mathematics lessons and to the way they are used to working with powers. This is the case of ST_M_07 who, with regard to the correct answer and the algebraic manipulation, affirmed: & ST_M_07: We are used to seeing and always using the property, so you do not see it. It does not even cross your mind to look for it.
Another interesting interview comes from student ST_F_05 who only fails to answer INVALSI task 21 correctly. Finally, explicit reference to didactic practice is also made by this student: ST_M_03: Because, in my opinion, many exercises are done on powers, and so it becomes stored in the brain as a form of reasoning. It is normal, perhaps, from our point of view to perform an addition or multiplication.

Discussion
The quantitative analysis highlights that this task creates great difficulties for students of grade 10, even though power properties (and, more generally, manipulation of algebraic expression) are extensively addressed in the first years of high school. Only 35% of students supplied the correct answer to this task and interviews confirm their difficulties in answering, even when they recognized the topic and referred to other exercises on powers.
Gender differences in choosing the correct answer are evident and in favour of males. The gap on the correct answer is of 7% but the distractor analysis reveals a particularity of this gap: if we consider males and females with the same ability, the percentage of correct answers is the same, while the gap is due to the fact that there are fewer girls reaching highest ability levels than boys. This is also confirmed by the absence of DIF in this item and it is in line with literature on gender differences: the gap is more evident if we consider top performers -in this case, we can state that the gap is due to the lower number of girls who reach the higher levels.
Furthermore, we can observe that there are differences between males and females of all ability levels in the way they choose an incorrect answer. Distractor D (a 37•38 ) is more attractive for females of low and medium ability levels, while distractor B (a 75 ) is more attractive for males. Those two options consist in a power with the same base of the two added in the task and, as exponent, an operation between the two exponents. As we expected, these two options are chosen also by students of medium ability levels who try to solve the task by locating some kind of rule for which the base must be the same and requires them to operate with exponents. This is explicit for example in the ST_F_01 interview in which she expressed the necessity to operate only with exponents and then chose D, affirming "it must be a property!". This behaviour is attributable to attitudes concerning the didactic contract, according to which many students (when faced with the text of a problem) activate a sort of selective reading, based on the identification of numerical data and on some rules that suggest the right operation to 'combine' the numbers in the text. As we can read in D' Amore et al. (2010), these norms are often caused by implicit norms generated by the didactic contract.
The necessity of finding some norms and rules to solve the task is also expressed by other students. For instance, the first student (ST_F_02) knew how to multiply powers with the same bases and understood that she "cannot apply the properties of the powers because we do not have multiplication; therefore, we cannot add the powers, so we cannot have 75" but then she went on to choose option D.
Interviews thus confirm that students who tackle this task are led not to reason on the meaning of this operation, for instance by working on orders of magnitude, but set out immediately to find a rule to apply. This behaviour caused students to exclude options A and C because they expect the answer to be a power with the same base: for instance, ST_M_01 explicitly declares that he excluded option A and C because the base is different.As predicted, for all the students interviewed, the task is not new in terms of content and almost all of them refer to theories studied and other exercises already carried out with powers. The need to find some rule to solve this task is inherent with didactical practices, as stated even by students who have answered the task correctly and then who have been asked to interpret the widespread nationwide difficulties. Students ST_M_03 and ST_M_04 answered the task in question correctly when trying to identify possible causes of error, they hypothesise that other students have drawn heavily on the identification of rules and properties. They said that, most probably, students who made mistakes have been trained to solve exercises with powers using properties without reasoning on the meaning of the power itself; such students know, on the basis of previous experience, that if the bases are the same they have to operate with exponents and this led them to exclude the correct option. Therefore, we confirm that the difficulties encountered by students in this task are not connected to a lack of knowledge, but rather to the incorrect use of knowledge related to power properties and to students' conviction that they need to apply procedures and strategies already used during the maths lessons. This behaviour is, therefore, connected to an influence of didactic contract both according to analysis of the processes adopted by those students who make mistakes, and the ideas of students considering possible causes of errors. Students often apply rules and properties when they operate with powers, and then assume they always have to do so, even when they understand that the properties cannot be applied in that particular case. Moreover, we note that the fact that choosing option D considerably increases the order of magnitude does not actually hinder students' choice. The causes of this behaviour can be traced to a clause of the didactic contract, that of formal proxy as defined by D'Amore: "The student reads the text, then decides on the operation to carry out and the numbers with which he/she has to work. At that point, indeed, the clause of the formal proxy is triggered. It is no longer for the student to reason and check; he/ she no longer considers what follows as his/her personal responsibility […]. The involvement of the student is finished; now it's up to the algorithm or, better still, to the machine, to work for him. The student's next task will be that of transcribing the result, whatever it is, and it does not matter what it means within the problematic context he/she started with." (D'Amore, 2008, p. 16).
This behaviour is evident of course also in students of medium and high ability levels, who are those most influenced by the didactic contract and misconceptions, as already observed in other studies (Giberti, 2018).
Finally, the interviews also confirmed the strong influence of metacognitive factors, especially for females, in answering this task: female student ST_F_01 expressed all her insecurity and low self-confidence regarding mathematics before explaining her reasoning and this, of course, led her to try to solve the task using previously-adopted strategies instead of focusing on the meaning of the sum of the two powers, even though she demonstrated correct knowledge of the properties of powers in the case of multiplication.

Conclusions
The results of this research fit within the international panorama of research on the gender gap, which is increasingly directed at detecting underlying causes of different performances by males and females. The analysis of gender gap on specific tasks can lead to the identification of factors of a didactic nature that have a different impact on males and females, and which can be interpreted throughout solid findings of mathematics education theories.
From a quantitative point of view, the analysis allows us to answer our research questions. The analysis performed using the Rasch Model highlights characterization of gender gap in terms of students' ability level. Moreover, it shows not only that there are fewer girls reaching highest ability levels than males but, among the students who give the wrong answers, different choices are made by girls and boys. Males and females with different abilities are attracted by different incorrect options; nonetheless, the wrong options most attributable to didactic contract are chosen by students, both male and female, of "average" mathematical ability. This trend, in line with the literature (Bolondi et al., 2018), is confirmed also by the interviews; the interviews conducted allowed us to frame the phenomena that emerged from the comments by students who chose certain wrong options through the theoretical framework of the didactic contract. Indeed, the qualitative analysis highlights how the students' difficulties in tackling this task are not directly connected to a lack of knowledgeinstead, the causes of the errors are attributable to the didactic contract. The students interviewed who chose those options often referred to classroom habits, their relationship with the teacher and the habits of the milieu (Brousseau, 1988). The interviews reveal behaviour specifically related to didactic contract and, although to a small extent, differences emerge in terms of gender. As is clearly stated in the literature (D'Amore et al., 2010;Sarrazy, 1995), although the intrinsic link with mathematics differentiates the didactic contract from the social contract, social factors still have an impact, exactly as with the gender gap. This paper contributes to the field of knowledge on the relationship between the didactic contract and the gender gap, a topic of increasing international interest in the field of mathematics education research (Bolondi et al., 2018). By combining quantitative and qualitative analysis, it can be concluded that the gender gap that manifests itself when answering these items is influenced by the didactic contract. The small number of students involved has made it possible to distinguish and characterize male and female behaviour; a research study with a larger sample of students to interview, focusing on the specific connection between didactic contract and gender gap, is currently underway.
We have no conflicts of interest to disclose.
Funding Open access funding provided by Alma Mater Studiorum -Università di Bologna within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.