Schooled and Ready: Assessment Reform

Leaton Gray, Sandra; Scott, David; Mehisto, Peeter

doi:10.1007/978-3-319-71464-6_5

Sandra Leaton Gray⁴,
David Scott⁴ &
Peeter Mehisto⁴

11k Accesses

You have full access to this open access chapter, Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Students in the European School System have since its inception been examined by the European Baccalaureate . The term, Baccalauréat, is used in different ways in different educational systems round the world. In Canada and Belgium it is used to indicate a Bachelor’s degree in Francophone universities. In France it refers to the country’s national school (lycée) diploma and is equivalent to British ‘A’ level qualifications. The English Baccalaureate is a performance measure to assess the work of students in secondary schools in England, Wales and Northern Ireland. In Wales, it is also a pre-university qualification. In Spain it refers to a particular form of post-secondary education . The International Baccalaureate Diploma, the oldest and most important of the four International Baccalaureate programmes, is a curriculum for students aged between 16 and 19. In the United States of America a Baccalaureate service is a farewell address given to a graduating class by a teacher or teachers.

The oldest of these is the French Baccalauréat, an academic qualification taken by French students at the end of high school. It thus signals the end of the compulsory period of education in France, typically at the age of eighteen years, and acts as a means of accessing the next stage of education. It was introduced by Napoleon I in 1808. There are other forms of Baccalaureate , such as the International Baccalaureate, but it was originally developed in France. Its most important feature is that it cannot be awarded in a single subject .

Within France, there are three main types of Baccalauréat: the Baccalauréat Général (General Baccalaureate ), the Baccalauréat Professionnel (Professional Baccalaureate), and the Baccalauréat Technologique (Technological Baccalaureate). There are some restrictions placed on the type of Baccalaureate that a student can present at some French universities and it doesn’t confer automatic rights of entry to any and every French university . Students who are registered for the Baccalauréat Général streams are asked to choose between three streams in their penultimate lycée year. Each of these streams prioritises one specialism over the others; however, this doesn’t mean that the student gives up altogether the study of subjects in other streams. Each stream therefore places different weights (coefficients) on each subject .

The Série Scientifique is specifically designed for students who wish to work in scientific fields such as medicine, engineering and the natural sciences. These students are required to specialize in mathematics, physics and chemistry, computer science or earth and life sciences. The Série Économique et Sociale is designed for students who want to eventually pursue careers in the social sciences, management and business administration , and in economics . The most heavily weighted subjects are economics and social sciences and these are only offered in this stream. The Série Littéraire prepares students for careers in the public services. The most important subjects in this stream are philosophy , modern French language and literature, and other modern foreign languages.

If a student is a pupil at a vocational lycée, they can prepare for either the Certificate d’Aptitude Professionelle (CAP) or the Brevet d’Etudes Professionelles (BEP). They can also study for a Brevet des métiers/d’art (BTM or BMA) or a Mention Complementaire (MC). The Brevet d’Etudes Professionelles is considered to be more theoretical than the Certificat d’Aptitude Professionelle, and some students after completing the first of these then go on to study a vocational Baccalaureate such as the Baccalauréat Professionnel. Technological Baccalaureates were introduced in 1968 and are grouped into three Series. The first Series includes engineering, physics, chemistry, biology, medical sciences and microtechnologies. The second Series includes business administration, management, and commercial and computer technologies. The third Series includes the applied arts, computer techniques, and techniques of music and dance. The 1992 Reforms extended this to: industrial science and technology, science and technology laboratory work, tertiary science and technology, medical social sciences and hospitality. As a result of the 2011 reforms, there are now eight Series of technological studies.

The majority of the Baccalauréat examinations take place every June. For lycée students, this is the terminale of the last year. Most of these examinations are of an essay format. The student is given a substantial block of time (depending on the examination, from two to five hours) to complete the written examination , setting out the various arguments around a topic. Mathematics and science examinations involve problem solving, in addition to writing short essays. Students taking foreign language examinations have to be able to translate text as well. In mathematics and the life sciences, the use of questionnaire à choix multiples (multiple choice questions) is in common use. All Baccalaureate students are also required to complete a short research project, known as the travaux personnels encadrés. These are formal examinations , conducted in controlled examination conditions. To further ensure fair marking by the examiners, the test is anonymous, thus eliminating any marking bias that may occur due to favoritism based on sex, religion, national origin or ethnicity.

The principles underpinning the Baccalaureate idea are those of breadth, comprehensiveness, cultural maturation, curriculum integration, allowing weak boundaries between subject disciplines and balancing the demands of specialization with a more rounded and general education. As we have noted, it cannot be awarded in a single subject. Consequently, all the students have to study all the subjects in a curriculum, even if some of these subjects are studied in more depth than others. In theory at least, the Baccalaureate can uniquely provide students with a gestalt (using this term in its original sense) that can act to frame their subsequent life and behaviours. They grow as a person as a result of an individual and cultural maturation or bildung.

The European Baccalaureate

Students in the European schools or in schools accredited by the Board of Governors are examined at the end of their schooling in the system through the European Baccalaureate process, and thus the use of this term refers to a programme of study (two years – S6 and S7 – in this case), an award which has currency in the European Union , and an examination , which is designed to test for knowledge , skills and dispositional elements of the curriculum that the student has followed over the previous two years. As we have seen in Chaps. 2 and 3, the European Baccalaureate cycle consists of a broad multilingual subject-based curriculum, in which students are obliged to take a combination of language, humanities and scientific subjects, with in many cases these subjects being taught through more than one language.

The core curriculum comprises: at least two language subjects (the dominant language and another one); mathematics, either 3 periods/week or 5 periods/week; one scientific subject , either biology 2 periods/week or any other 4-period scientific subject in either biology, chemistry or physics; history and geography , either 2 periods/week or 4 periods/week, which are taught through a different language from the dominant one, either in French, English or German; philosophy , either 2 periods/week or 4 periods/week; physical education; and ethics or religion. In addition to this core curriculum , students choose from a wide range of options , and this amounts to a minimum of 31 periods per week and a maximum of 35 periods each week.

Candidates take three oral examinations (L1, L2 or a subject taught through L2 such as history , geography or economics ). Consequently, candidates are required to demonstrate written and oral proficiency in at least two languages. They are also required to take five written examinations: language 1 or advanced language 1, language 2 or advanced language 2, mathematics (5 periods) or mathematics (3 periods), option I (4 periods) and option II (4 periods). The following three factors are taken into consideration for the award of a European Baccalaureate: the average preliminary mark (C) expressed out of 100, the average written examinations mark (W) expressed out of 100, and the average oral examinations mark (O) expressed out of 100. The proportion of the final total mark for the examination allotted to the various parts is as follows: 50% for the average preliminary mark C, 35% for the average W for the written examinations, and 15% for the average O for the oral examinations . The final result is 0.50 C + 0.35 W + 0.15 O.

The preliminary mark is made up of the following: class marks (A marks) and examination marks (B marks). Class marks account for 20 marks out of 50 for the purposes of calculating the preliminary mark (C mark). A class mark is given for each subject taken in year 7 (S7), with the exception of religion/ethics, at the end of each semester. The marks for the examination account for 30 marks out of 50 and are used for the purposes of calculating the preliminary mark (C mark). A mark is given for each subject, with the exception of religion/ethics, on the basis of the results obtained in the examination. Compulsory subjects (with the exception of physical education and religion/ethics), options , and advanced subjects can be the subject of written and oral examinations.

Each examination covers the entire syllabus of the corresponding subject in S7 but is also designed to assess the competences (knowledge , skills and dispositions) acquired in previous years, especially in S6. The marks awarded in both the written and oral examinations are subject to a double moderation and marking by both the candidates’ teachers and the external examiners. The final mark is the average of the two examiners’ marks. In the case of a mark-disagreement of over two points, a third external moderator is brought in and their task is to establish through a thorough analysis of the previous moderations a final mark between the highest and the lowest awarded by the two previous markers.

Assessment in the European Baccalaureate cycle is criterion-referenced. Though norm-referenced systems of assessment have become less popular, criterion-referenced systems are not without their problems. Systems with a simple pass-fail result such as a driving test are much easier to operate than complex multi-level systems such as the European Baccalaureate. Criteria are relatively easy to identify for use in testing a performance like driving proficiency, but harder to associate precisely with a range of levels of learning as in a school curriculum. In addition, criterion-referenced systems conflate logical hierarchies of skill and content with developmental approaches to the teaching of students. Establishing criteria appropriate to the various levels involves some notion of an average student, which is always difficult to determine. It measures pupils’ attainment in relation to the level at which the learning objectives and required competences defined in a given syllabus (in the case we are dealing with here, the European Baccalaureate) have been attained. The European Baccalaureate inspectorate also provides assessment and marking guidelines for criteria for both oral and written examinations .

Assessments may be more or less integrated with the teaching programmes that pupils follow. Some kinds of assessment (for example, IQ tests) are not designed to measure pupil’s learning (or the results of a teaching programme), in which case they are often associated with measures of qualities supposedly inherent in the student, such as intelligence. Assessment which is placed at the integrated end of the continuum is likely to be more informal than formal, more formative than summative, process rather than product-orientated, and to be frequent or continuous rather than taking place at one time point, usually at the end of the programme of study. The European Baccalaureate offers greater scope to the designers of the various curricula, because it is better integrated with the programmes of study.

Assessment in the European Baccalaureate is intended to be both formative and summative. Formative assessment focuses on the process of learning. It is reflected in the so called A marks. The A marks represent the pupils’ everyday work in a subject , which consists of a variety of tasks and aspects such as: the degree of focus and attention in class; the students’ active participation and the quality of their interventions in class; the regularity and consistency of their work in class and at home; how positive their attitude towards the subject is; whether they show signs of initiative, independence and autonomy ; and the progress they are making. These forms of formative assessment are in the main attitudinal and can be thought of as regulative devices (i.e. regulating the behavior of the person), rather than learning experiences and assessments of competences.

Summative assessment reflects the performance of a pupil at the end of a given period of instruction. These Baccalaureate examinations are designed to assess the pupils’ competences acquired over an extended period of time in a range of subjects. They are held under standardized physical conditions (relating to room arrangements, the use of specific formats, invigilation processes, etc.) and under time constraints. Formative modes of assessment are most closely associated with the process of teaching itself, but it is the results of these summative tests that are most visible and public. Formative dimensions of assessment focus on providing information for the teacher about the way learners complete particular tasks. The information provided is intended to feed directly into the teaching process, so the focus is on how students tackle these tasks and how they go about solving problems that they are given. The assessment environment does not need to be standardized during formative processes of assessment.

Summative assessment is concerned with determining whether students have mastered particular elements of the curriculum . Summative assessments aim to be reliable and valid; and homogeneity of context is considered to be important so that comparability becomes possible. A summative assessment marks some point in the otherwise potentially organic teaching and learning process at which it is decided to stop teaching and give one’s full attention to assessment. The stage at which it is most important to carry out this kind of assessment is often determined by factors other than those arising from learning goals, such as predetermined times in the school year, or a requirement to report to other interested parties, as we have seen in the European Baccalaureate.

European Baccalaureate diploma holders enjoy the same rights and benefits as other holders of secondary school-leaving certificates in their countries, including the same right as nationals with equivalent qualifications to seek admission to any university or institution of tertiary education in the European Union . This issue is an important one and we discuss it in more detail in Chap. 6.

In 2007 the Board of Governors commissioned an external evaluation of the European Baccalaureate, the objectives of which were: to determine its fitness for purpose, its quality, the extent of its recognition by the member states, and whether it was in a fit state to be offered to students outside the European schools. The Report was received in 2008 and, though this is to some extent the fault of the specification of work given to the evaluators, it failed to adequately address the make-up, both in a practical and normative sense, of the internal and external relations of the Baccalaureate , focusing on a small number of technical issues, at the expense of examining fundamental curriculum and assessment principles.

The evaluators, Cambridge Assessment, argued that there were no curriculum incoherencies or grossly inappropriate contents, approaches , or demands in the European Baccalaureate. They did however, identify one subject , Geography , which appeared to require urgent review. It is worth reminding ourselves that this review was completed in 2008, and that a lot has changed since then in the programmes of study. They further argued that there was a relatively restricted range of subjects, suggesting the possibility of including business-related and applied subjects, non-European languages, drama and media studies in the curriculum. Science syllabuses, they suggested, should be updated and a stronger and more coherent approach to the development of enquiry-based and investigative skills established.

The European Baccalaureate involves a high volume of internal assessment by teachers. This, they argued, is a potential strength, establishes an integrated learning and assessment model and makes a positive contribution to its validity . However, while European Baccalaureate teachers are very experienced, opportunities for ensuring that all teachers have access to early induction and standards training , they suggested, were vital. The extent to which common standards can be shown to apply across all subjects was also an issue for the evaluators in the marking of the final examinations where systems of marking review across subjects, between examiners and across years are not well-defined.

Greater clarity would be achieved, they suggested, by statements of actual time in the programme. In terms of weightings between different parts of the programme, the evaluators were of the view that the value contributed by the internal assessment of preliminary marks should be retained. Proposals for a revised weighting of written examinations relative to L1 and L2 oral examinations, they argued, would seem to overstate the contribution which a student’s oral performance in languages made to their overall European Baccalaureate score, particularly for those students who were preparing for science, medicine and engineering courses at university . The practice of double marking, they suggested, should be reviewed, and the evaluators urged the European Baccalaureate curriculum developers to move towards ‘virtual’ standardization approaches, particularly using digitised scripts and on-line marking.

Significantly, they argued in their Report, all examinations should be regarded as ‘high stakes’. Those examinations, which mark the end of secondary education and provide for progression to university , are of the highest importance to individuals and impose rigorous standards of accountability on assessment bodies. Finally, the evaluators considered that the adoption of quality models such as ISO 9001 certification or the quality assurance procedures developed by the Association of Language Testers in Europe (ALTE) would be of value. They urged the European Baccalaureate to also consider the establishment of its own Code of Practice to complement the more administratively oriented focus of the arrangements for implementing it. Some of these suggestions have been taken up by the European Baccalaureate curriculum makers, though none of them address the fundamental tensions and difficulties caused by the conflation of formative and summative purposes, the poor use of coursework processes, the confusions surrounding oral assessments (their use should be commensurate with their capacity to validly assess some aspect of the curriculum), and the inclusion of regulative activities in assessment processes (as they are currently expressed in the assessment arrangements).

Reorganising the European Baccalaureate

In 2015 we wrote a report about the European Baccalaureate, in the form of an evaluation of a proposal for the reorganisation of secondary studies in the European Schools for secondary years 4, 5, 6 and 7 (cf. Leaton -Gray et al. 2015). The objectives of the study were to establish and demonstrate the impact of the proposed new structure for secondary studies (i.e. levels S4–S7, though reference is also made to S1–S3 on the grounds that forms of progression and curriculum coherence require consideration of lower secondary as well as upper secondary studies), compared to the current situation. And in addition we sought to determine whether and to what extent the proposals: met the principles stated in the Convention; ensured access to European secondary and tertiary education systems ; took into account the mandate given by the Board of Governors ; took into account the needs of pupils faced with the demands of the modern world; were relevant, coherent, comprehensive, and allowed breadth of study for all pupils in the system; conformed to the accepted and logical principles of curriculum design; and guaranteed in the last two years, leading to the European Baccalaureate, a general education around the eight key competences for lifelong learning.

Our suggestions were comprehensive and in accord with the principles that underpin the construction of productive learning environments . We suggested that Baccalaureate rules should be amended so that each student takes eight examinations; the determination of each of these examinations , i.e. whether they should include oral, coursework and/or written papers, and the relations between them, is discussed below. We argued that forms of discriminatory groupings, such as streaming, setting, multi-age and multi-grade arrangements, should be minimised insofar as resources within the system and institutions allow this to happen. The nine-year upper tenure limit for European schools’ teachers, and the loss of organizational knowledge that is associated with removing these skilled practitioners at the end of their tenure, often to be replaced with a Chargé de Cours (locally hired) teacher who is not appointed via the same route, was one of our strongest recommendations. This was to ensure that the European Schools Systems and its various institutions, i.e. the schools, retained their institutional memories. In addition, candidates, we suggested, should take eight examinations : language and communication (L1), mathematics, language and communication (L2), humanities , expressive and performative studies, science, social studies, option 1 and option 2. In both option slots, students should choose between streams. They should only be allowed to make one choice from their stream in this pathway.

Each examination should consist of four elements: coursework, practical, oral and a written paper. The proportion of the final total mark for the examination allotted to the various parts as a result should depend on the curriculum content (i.e. knowledge constructs, skills and dispositions) of the subject area. In other words, not every subject should be tested through all four elements, but only through those elements that refer to the type of curriculum content of the subject. For example, language and communication (L1) should be tested through 30% coursework (C), 20% oral (O) and 50% written examination (WE). The final result would then be 0.30 C + 0.20 O + 0.50 WE. Class marks should no longer be awarded as this is a summative examination. Orals and practicals should be conducted one month before the date of the examination in each subject . Coursework, oral and practical completion and assessment rules would need to be written, with the following principles applied. Each task is criterion-referenced with those criteria being open and available to students. Marks are allocated to each criterion and made public. The work should be completed in non-regulated settings. It should be marked by the teacher, sample-moderated by the Baccalaureate office, and sample-moderated by an external examiner (to the system), who in addition benchmarks the marking against comparable systems. Marks would not be released until the final examination result had been declared. These are practical recommendations. However, they should only be developed with regards to a full understanding of assessment and examination practices.

Examinations

Here we focus on the general notion of assessment in its many guises and forms. All of these manifestations reflect decisions that have been made and will be made in the future about who and what is assessed, for what reason and in what way, and they all reflect a particular social context. What this means is that the particular forms of assessment that are adopted are dependent on how those social contexts are and have been constructed. The underlying principle when we are dealing with assessment practices is that educational assessment must be understood as a social practice. Moreover, although it is possible to trace policy issues in assessment back to the earliest days of public examinations when, for example, the Emperor Napoleon recognized the powerful contribution nationally controlled assessment procedures could play in cementing national unity, in recent years the importance of assessment as a policy tool has grown enormously as governments and education systems have increasingly come to realize its powerful potential as a mechanism of social control.

For Michael Foucault (1979), the examination combines the techniques of an observing hierarchy and those of a normalizing judgment. The examination therefore enables society to construct individuals in particular ways. Knowledge of persons is created which has the effect of binding individuals to each other, embedding those individuals in networks of power and sustaining mechanisms of surveillance, which are all the more powerful because they work by allowing individuals to govern themselves. The examination introduced a whole new mechanism that in effect both contributed to a new type of knowledge formation and constructed a new network of power, all the more persuasive once it had become established throughout society.

This mechanism works in three ways: firstly, by transforming what can be seen and observed into an exercise of power; secondly, by introducing the idea of the individual into the field of documentation; and thirdly, by turning each individual into a case. In the first instance, disciplinary power is exercised invisibly and this contrasts with the way power networks in the past operated visibly, through the explicit exercise of force. This invisibility works by imposing on subjects a notion of objectivity that acts to bind them to a truth about that examination, a truth that people find hard to resist. The examined person understands themself in terms of criteria that underpin that process, not least that they are successful or unsuccessful. The examination therefore works by arranging objects or people in society.

In the second instance, the examination allows the individual to be archived by being inscribed textually. An attempt is made to position these knowledge-development activities as contributing to better and more progressive framings of society. Over the last twenty years in schools in Europe , the proliferation and extension of assessment through such devices as key stage tests, records of achievement, examined course work, education certificates, and school reports, and evaluation through such devices as school inspection, teacher appraisal, profiles and the like, means that teachers and students are increasingly subject to disciplinary regimes of individual measurement and assessment which have the further effect of determining them as cases.

The third of Foucault’s modalities refers to the objectification of the individual as a branch of knowledge , so that the individual can now be described, judged, measured and compared with others. One final point needs to be made about the examination, and this is that for the first time the individual could be scientifically and objectively categorized and characterized through a modality of power where difference becomes the most relevant factor. Hierarchical normalization becomes the dominant way of organizing society. Foucault is suggesting here that the examination itself, seemingly a neutral device, in reality acts to position the person being examined in a discourse of normality, so that for them to understand themselves in any other way is to understand themselves as abnormal and even as unnatural. This positioning works to close off the possibility of the persons being examined of seeing themselves in any other way, though it may not be successful.

Assessment serves a wide range of purposes, ranging from the most commonplace of exchanges in a restaurant for example to school reports and high-stakes examinations, from individual job interviews to national monitoring. What unites all of these is the sense in which assessment first and foremost is a proxy for determining the quality of something or someone. It therefore operates as a mechanism for placing that person or object in a particular hierarchy of values: this person is better than this other person with regards to a particular range of skills and this school is better than this other school because its students have graduated with better examination results. This spectrum of communication ranges from the most informal of exchanges to the extremely formal, spanning everything from school reports to high-stakes public examinations, and from individual job interviews to national monitoring, the common factor being the use of assessment data of one kind or another as a publicly acceptable code for quality. Closely associated with this is the issue of legitimacy. The results of any particular assessment device have to be trusted by the public if the consequences are to be acceptable. Sadly, assessment issues are generally treated as technical matters, as focusing on improving the methodologies used to assess people rather than on the purposes or consequences of using such approaches. We can see this most clearly in the 2008 Evaluation of the European Baccalaureate by Cambridge Associates.

What this in effect means is that on occasions clear contradictions and tensions between common assessment practices emerge. An example of this is the incompatibility between policies and practices, which lead to an increasingly test-driven educational and curricular culture as well as an explicit commitment to lifelong learning processes. Another example might be the tension between summative and formative purposes in an assessment. This learning agenda, exemplified in the notion of formative assessment, is at odds with the use of punitive high-stakes testing, which has as its principle purpose raising standards, though the notion of a standard is in itself a contentious issue. Another tension within the system focuses on the more or less contradictory pressures of maintaining and indeed even increasing enrolment whilst at the same time keeping standards high and ensuring that the public have confidence in those standards. Dore’s (1976) now classic study of qualification inflation showed how the interaction between the supply of qualifications and the availability of employment opportunities tended to result in the pursuit of ever-higher levels of qualification as a form of educational inflation.

Internationalization of Assessment

An extremely important aspect of assessment is its increasing internationalization, exemplified by large-scale cross-national assessment studies, such as the Programme for International Student Assessment (PISA) . Andreas Schleicher (2013), from the Organization for Economic Cooperation and Development (OECD) , uses a methodology that involves the ranking of a variety of countries in relation to their performance on a series of tests, and then the identification of those systemic elements that are present in high performing countries and not present in low-performing countries. From this he concludes that it is possible to identify the optimum conditions for a system’s effectiveness. He is therefore able to suggest that: children from similar social backgrounds can show very different performance levels, depending on the school they go to or the country they live in; there is no relationship between the share of students with an immigrant background in a country and the overall performance of students in that country; there is no relation between class size and learning outcomes within or across countries (the conceptual framework he works to here makes the unjustified assumption that all the different types of learning activities are optimally performed with the same class size); there is no incompatibility between the quality of learning and equity since the highest performing education systems combine both; all students are capable of achieving high standards; and more generally, top performing education systems tend to be more rigorous, with fewer curriculum items and with these being taught in greater depth.

The approach has a number of flaws in its conceptualisation and application. The first of these is that an assumption is made that a person has a knowledge , skill or dispositional set, which is configured in a particular way (i.e. it has a grammar), and it is this knowledge, skill or dispositional set, or at least elements of it, which is directly assessed when that person is tested. In contrast, any testing that is carried out with the purpose of determining whether these attributes are held, not held, or even partially held by an individual, always involves an indirect process of examination , where the additional element is a conjecture, retroduction, inference or best guess.

A second false belief is that this grammar is organised into elements, there are relations between those elements, and each element can be scaled, which can then be directly investigated. This can be contrasted with a position which suggests that, in the application of the knowledge, skill or dispositional set, whether for the purposes of testing or for use in everyday life, a range of other knowledge elements, skills and dispositions are referred to. There is, therefore, a set of factors that in combination may result in construct-irrelevance variance (cf. Messick 1989), that is, variance amongst a population of testees as a result of factors that do not have anything to do with the construct being tested. Even if knowledge of or competence in the construct is equally distributed in this population, some testees will do better than others (that is, on their actual scores) and this is not because they have greater knowledge or are more competent in the construct being tested. This might involve either construct-under-representation or construct-over-representation, and within the confines of the test itself it is impossible to determine which of these has occurred.

A third false belief is that in the use of a knowledge-set, or in the performance of a skill, or in the application of a disposition, no internal transformation takes place. There is also an external transformative process at work, and thus a fourth false belief is that testing a person’s knowledge, skills and aptitudes has no washback effects on either the original knowledge construct, or the internally transformed knowledge set ready for testing. In contrast, the well-documented process of washback works in just this way, so that instead of the assessment acting merely as a descriptive device, it also acts in a variety of ways to transform the construct it is seeking to measure.

A fifth false belief is that the process of testing works in a unidirectional linear fashion. For example, a person knows something, that person is subjected to a test which is designed to test for traces of that learning in a population of knowers with similar characteristics, and a score in relation to that construct is recorded indicating that the person either knows it, doesn’t know it or knows it to some extent. No consideration is given to bidirectionality, incorporating forward and backward flows, so that the taking of the test and the recording of the mark impact on and influence the original knowledge construct. This changes the structure (both quantitatively and qualitatively) of the construct, and its affordances, making the original determination of it and them unreliable.

A sixth false belief is that different types of knowledge , including those at different levels of abstraction, can be tested using the same algorithmic process; and a seventh false belief is that the performance on the test represents to a greater or lesser extent (given that the person may have been distracted or constrained in some way or another) what the testee can do or show, rather than there being a qualitative difference between the performance on the test and the construct, skill, or disposition of the testee. An individual may have to reframe their knowledge set to fit the test, and therefore the assessment of their mastery of the construct is not a determination of their capacity in relation to the original construct, but a determination of whether they have successfully understood how to rework their capacity to fit the demands of the testing technology.

An eighth false belief is that a test can be constructed which is culture-free or free of those issues that disadvantage some types of learners at the expense of others. The extent of cultural bias in the PISA tests is unrealised and certainly under-reported. In addition, a particular technical problem with PISA relates to its sampling procedures. If different types of sampling in the different countries are used, then some of these countries will be disadvantaged compared with others. Sampling issues are present in any test, whether they refer to selecting children from a number of grade levels and not specifying proportions from each grade, to selecting parts of countries for reporting purposes and ignoring the rest, as in the 2015 PISA tests (OECD 2016), where only the richest and better educated cohort of learners was entered (from Shanghai), and these were allowed to represent China as a whole, to the selective (by the individual country) non-participation of some types of schools in some countries and not others. Cultural differences take a number of different forms, such as, ascribing different values, and different strengths of values to cultural items, or determining the nature, quality, probative force, relevance-value and extent of evidence, or focusing on practices which may be more familiar to people in some countries and less so in others. However, more importantly, cultural differences with regards to the selection of test items refer to the expression of the problem to be solved. If, for example, different national idioms, different national ways of thinking embedded in language forms, and different normic values woven into the fabric of national discourses are ignored, then the presentation of the actual test items as well as the range of possible answers that can be given may favour students from one nation at the expense of students from another.

There are a number of ways of identifying good practice within a system of education. The first is identifying outputs from the system (these can be test scores, dispositional elements, acquired skills, ethical and moral qualities); that is, outputs that have resulted from the individual’s participation in the system itself. The argument is then made that one system is better than another because it has better outputs, and, further to this, that the characteristics of these national systems should be bottled up and transferred to those countries or jurisdictions or systems which are considered not to be successful or effective in these terms. It is interesting that the European School system is wedded to the use of quantifying, reductionist and in some cases misleading measures to determine whether it is successful or not.

If the information collected about individuals in a system of education at the end of their time spent in the system is used to make judgements about the quality of provision within them, then there are two possibilities: raw scores – student scores are aggregated to allow comparative judgements to be made about these schools, districts, states or nation states; and value-added scores – value-added data analysis models the input of particular institutions or systems, such as schools, in relation to the development of individuals that belong to those institutions or systems. As a result of these processes, a value can be attached to the input of the educational institution or nation as it has impacted on the progress of the individual(s) who attended it, or been a part of it. The accuracy of such modelling depends on the belief that the educational researcher has in the reliability and validity of the data that is used, in the decisions they make about which variables to use in the modelling process, and also in the ability of the researcher to develop appropriate indicators or quasi-properties to reflect the actual properties of individuals, educational institutions and nations, and their covariance in real-life settings. This in theory allows one to make comparative judgements between students, schools, districts, states or nation states, though all the systems that have been devised and used have in one way or another proved to be unsatisfactory.

A further way of determining quality in a system is by identifying a norm so as to allow a comparison to be made. For example, a system of education, whether international , national or local (or even cross-national as in the European School System) can be compared with, and marked against, a model of best practice, where this model is constructed in terms of the inclusion of all the possible elements that could and should form an education system (i.e. structures, institutions, curricula, pedagogic arrangements and evaluative procedures), their arrangement in the most logical way (for example, that curricular intentions should precede pedagogical approaches and indeed derive their credibility from these curricular intentions), and the identification and enactment of logically formed relational arrangements between these elements (i.e. that evaluative washback mechanisms should not be allowed to distort the curriculum as it was originally conceived). The norm that is used comparatively is constructed through sound logical and philosophical foundational principles. And in addition the meaning of concepts is treated as an empirical matter, as to how they are used in communities. A reliance on outputs in the comparative process is unsafe and more importantly likely to be invalid. The preferred methodological approach then becomes a searching for mechanisms, relations and structures that are potentially causally efficacious, can be contextualised (historically, culturally and socio-economical), but can also contribute to human wellbeing. And in turn this would involve the avoidance of reductionist and decontextualized accounts (such as in Mourshed et al. 2010) of how education systems round the world operate.

What it is possible to argue is that there is now a world trade in educational policies, especially in relation to assessment issues. This policy borrowing, the take-up of apparently good ideas developed in one country by another, has further strengthened the grip of conventional assessment assumptions. Despite the significant evidence concerning flaws in international comparisons of student achievement, the power of the simple messages that can be and are derived from them about relative national success in a world of increasingly global competition has acted to reinforce the prevailing domination of established forms of educational assessment.

Validity and Use

Samuel Messick (1989), some time ago, argued that the validity of assessment practices inheres in the consequences that follow from their use. The impact of assessment on the lives of individuals is becoming more widespread and serious with its growing importance across the world. It follows that there is clearly a need for more thorough explorations of both the validity and the reliability of the various approaches to designing and interpreting the test data that are commonly used by governments (and by education systems such as the European School system) and which command the confidence of a public which does not understand the technical limitations. The research data show that current policies are ill-informed, and are almost certainly far from the best, though rich and varied.

Some of the defining aspects of recent assessment research stand out with quite remarkable clarity. Chief amongst these is the increase in assessment activity of all kinds and the penetration of assessment in its various guises into almost every aspect of human endeavour. We have become assessment societies, as wedded to our belief in the power of numbers, grades, targets and league tables to deliver quality and accountability, equality and defensibility as we are to modernism itself. History will readily describe the 1990s and 2000s as ‘the assessment era’, when belief in the power of assessment to provide a rational, efficient and publicly acceptable mechanism of judgement and control reached its high point across the world.

The assessment revolution has been one of scale, range and significance; a revolution that has elevated quantitative data, the raw material of most public assessment, as the principal mechanism for delivering transparency, accountability and predictability. The collection of data has become in itself a major instrument of social control, whether this is at the level of the individual, the institution or indeed whole operational systems such as that of education.

All these various criticisms are helping to challenge the assumptions on which most of the existing edifices of assessment have been built. Belief in the power of conventional summative assessment techniques to be objective and efficient, to motivate present performance and to predict future performance, is being challenged by a range of research evidence that identifies significant flaws in these assumptions. Moreover, the assumptions highlight the worrying consequences that the use of assessment to measure and control has, including reduced motivation and significantly lower performance on the part of students.

Much of the familiar contemporary apparatus of assessment technologies was born of the modernist assumptions and educational needs of the nineteenth century. The assumptions informing these approaches can be identified as: the capacity to seek to identify relative levels of student performance as the basis for educational selection ; to undertake such identification with a sufficient degree of objectivity that it provides a broadly fair outcome for the candidates affected; that the quality of such assessment is embodied in notions of reliability and validity; that students’ scores on national examinations and tests provide a valid indicator of the quality of institutional performance; and that it is possible usefully to compare the productivity of individual education systems through international comparisons.

Assessment standards can be used in a number of different ways, with different consequences. They can be used to determine whether and in what way the individual is meeting them, as well as providing information about how the individual can perform better in the future. Learning and assessment practices on the learning programme can be regarded as formative if: there is evidence of the student’s achievement; that evidence is elicited, interpreted, and used by the teacher, the individual student and their fellow students; and such evidence is used by the teacher with the specific intention of deciding on the subsequent steps in the teaching-and-learning process (i.e. ‘instruction’ with the intention of further developing learning). The interaction between the teacher and their student(s) is formative when it influences the learner’s cognition: the teacher’s external stimulus and feedback triggers an internal production by the individual student . Or they can be used to summarise levels of achievement at group, school or national levels. In summary, they can be used summatively or formatively. In the European School System, summative forms of assessment take priority over formative forms of assessment, sometimes to the detriment of learning processes.

In the next chapter we examine the external relations of the system; that is, the relations between the European schools and the EU higher education system ; relations between the curriculum offered in the European schools and the curriculum and assessment arrangements in European nations; and the relations between the European Baccalaureate and other Baccalaureate and final examinations systems in the rest of Europe .

Author information

Authors and Affiliations

UCL Institute of Education, London, UK
Sandra Leaton Gray, David Scott & Peeter Mehisto

Authors

Sandra Leaton Gray
View author publications
You can also search for this author in PubMed Google Scholar
David Scott
View author publications
You can also search for this author in PubMed Google Scholar
Peeter Mehisto
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

This chapter is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

Copyright information

About this chapter

Cite this chapter

Leaton Gray, S., Scott, D., Mehisto, P. (2018). Schooled and Ready: Assessment Reform. In: Curriculum Reform in the European Schools. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-319-71464-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-71464-6_5
Published: 30 May 2018
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-319-71463-9
Online ISBN: 978-3-319-71464-6
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics