Skip to main content

Teacher-Examiners’ Explicit and Enacted Beliefs About Proficiency Indicators in National Oral Assessments

  • Chapter
  • First Online:
Teacher Involvement in High-Stakes Language Testing

Abstract

The test of English oral proficiency is important in High-Stakes national examinations in which large numbers of teachers are involved as examiners. Although the literature shows that the reliability of oral assessments is often threatened by rater variability, to date the role of teacher beliefs in teacher-rater judgements has received little attention. This exploratory qualitative study conducted in Singapore identified teachers’ beliefs about the construct of oral proficiency for their assessment of secondary school candidates and examined the extent to which these beliefs had been enacted in real-time assessment. Seven experienced national-level examiners participated in this study. They listened to audio-recordings of four students performing an oral interview (conversation) task in a simulated examination and assessed the performance of each of them individually. Data about teachers’ thinking which revealed their underlying beliefs when assessing was elicited through Concurrent Verbal Protocol (CVP) sessions. In addition, a questionnaire was administered a month later to elicit their explicit beliefs. Findings showed that teachers possessed a range of beliefs about the construct of oral proficiency but only some of these formed the core of their expressed criteria when assessing student performance in real time. Implications for oral assessments and further research are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Alderson, J. C. (1991). Language testing in the 1990s: How far have we come? How much further have we to go? In S. Anivan (Ed.), Current development in language testing (pp. 1–26). Singapore: SEAMEO Regional Language Centre.

    Google Scholar 

  • Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgements in a performance test of foreign language speaking. Language Testing, 12, 238–257.

    Article  Google Scholar 

  • Black, P., Harrison, C., Hodgen, J., Marshall, B., & Serret, N. (2011). Can teachers’ summative assessments produce dependable results and also enhance classroom learning? Assessment in Education: Principles, Policy & Practice, 18, 451–469.

    Article  Google Scholar 

  • Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89–110.

    Article  Google Scholar 

  • Borg, S. (2006). Teacher cognition and language education: Research and practice. London: Continuum.

    Google Scholar 

  • Brookhart, S. M. (2013). The use of teacher judgement for summative assessment in the USA. Assessment in Education: Principles, Policy & Practice, 20, 69–90.

    Article  Google Scholar 

  • Brown, A. (2000). An investigation of the rating process in the IELTS oral interview. In R. Tulloh (Ed.), IELTS research reports 2000 (Vol. 3, pp. 49–84). Canberra: IELTS Australia.

    Google Scholar 

  • Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Language Testing, 20, 1–25.

    Article  Google Scholar 

  • Brown, A. (2006). An examination of the rating process in the revised IELTS speaking test. In P. McGovern & S. Walsh (Eds.), IELTS research reports 2006 (Vol. 6, pp. 1–30). Canberra: IELTS Australia.

    Google Scholar 

  • Buck, S., Ritter, G. W., Jensen, N. C., & Rose, C. P. (2010). Teachers say the most interesting things: An alternative view of testing. Phi Delta Kappan, 91, 50–54.

    Article  Google Scholar 

  • Carey, M. D., Mannell, R. H., & Dunn, P. K. (2011). Does a rater’s familiarity with a candidate’s pronunciation affect the rating in oral proficiency interviews? Language Testing, 28, 201–219.

    Article  Google Scholar 

  • Chalhoub-Deville, M. (1995a). Deriving oral assessment scales across different tests and rater group. Language Learning, 45, 251–281.

    Article  Google Scholar 

  • Chalhoub-Deville, M. (1995b). A contextualised approach to describing oral language proficiency. Language Testing, 12, 16–33.

    Article  Google Scholar 

  • Cheng, L. (2008). Washback, impact and consequences. In E. Shohamy & N. H. Hornberger (Eds.), Encyclopedia of language and education (Vol. 7, 2nd ed., pp. 349–364). New York: Springer.

    Google Scholar 

  • Costigan, A. T., III. (2002). Teaching the culture of high-stakes testing: Listening to new teachers. Action in Teacher Education, 23, 28–34.

    Article  Google Scholar 

  • Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33, 117–135.

    Article  Google Scholar 

  • Davison, C. (2004). The contradictory culture of teacher-based assessment: ESL teacher assessment practices in Australian and Hong Kong secondary schools. Language Testing, 21, 305–334.

    Article  Google Scholar 

  • Douglas, D. (1994). Quantity and quality in speaking test performance. Language Testing, 11(2), 125–144.

    Article  Google Scholar 

  • Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis. Cambridge, MA: MIT press.

    Google Scholar 

  • Fang, Z. (1996). A review of research on teacher beliefs and practices. Educational Research, 38, 47–65.

    Article  Google Scholar 

  • Fulcher, G. (2003). Testing second language speaking. London: Pearson Education.

    Google Scholar 

  • Fulcher, G. (2015). Assessing second language speaking. Language Teaching, 48, 198–216.

    Article  Google Scholar 

  • Gambell, T., & Hunter, D. (2004). Teacher scoring of large-scale assessment: Professional development or debilitation? Journal of Curriculum Studies, 36, 697–724.

    Article  Google Scholar 

  • Goh, C., & Burns, A. (2012). Teaching speaking: A holistic approach. New York: Cambridge University Press.

    Google Scholar 

  • Goh, C., Zhang, L. J., Ng, C. H., & Koh, G. H. (2005). Knowledge, beliefs and syllabus implementation: A study of English Language teachers in Singapore. Singapore: Graduate Programmes and Research Office, National Institute of Education, Nanyang Technological University.

    Google Scholar 

  • Goh, C. (2009). Perspectives on spoken grammar. ELT Journal, 63(4), 303–312.

    Article  Google Scholar 

  • Goldberg, G. L. (2012). Judgment-based scoring by teachers as professional development: Distinguishing promises from proof. Educational Measurement: Issues and Practice, 31, 38–47.

    Article  Google Scholar 

  • Green, A. (1998). Verbal protocol analysis in language testing research: A handbook. Cambridge: Cambridge University Press.

    Google Scholar 

  • Gulek, C. (2003). Preparing for high-stakes testing. Theory Into Practice, 42, 42–50.

    Article  Google Scholar 

  • Hadden, B. L. (1991). Teacher and nonteacher perceptions of second language communication. Language Learning, 41, 1–20.

    Article  Google Scholar 

  • Harlen, W. (2005). Teachers summative practices and assessment for learning: Tensions and synergies. Curriculum Journal, 16, 207–223.

    Article  Google Scholar 

  • Huang, B. H. (2013). The effects of accent familiarity and language teaching experience on raters’ judgments of non-native speech. System, 41, 770–785.

    Article  Google Scholar 

  • Jenkins, S., & Parra, I. (2003). Multiple layers of meaning in an oral proficiency test: The complementary roles of nonverbal, paralinguistic, and verbal behaviors in assessment decisions. Modern Language Journal, 87, 90–107.

    Article  Google Scholar 

  • Jin, T., Mak, B., & Zhou, P. (2011). Confidence scoring of speaking performance: How does fuzziness become exact? Language Testing, 29, 43–65.

    Article  Google Scholar 

  • Johnson, K. E. (1992). The relationship between teachers’ beliefs and practices during literacy instruction for non-native speakers of English. Journal of Literacy Research, 24, 83–108.

    Google Scholar 

  • Joe, J. N., Harmes, J. C., & Hickerson, C. A. (2011). Using verbal reports to explore rater perceptual processes in scoring: A mixed methods application to oral communication assessment. Assessment in Education: Principles, Policy & Practice, 18(3), 239–258.

    Article  Google Scholar 

  • Klenowski, V., & Wyatt-Smith, C. (2012). The impact of high-stakes testing: The Australian story. Assessment in Education: Principles, Policy & Practice, 19, 65–79.

    Article  Google Scholar 

  • Koh, C. H. C. (2003). An exploratory study of three raters’ decision-making process of the picture conversation task used for primary six candidates in Singapore. Dissertation, National Institute of Education, Nanyang Technological University, Singapore.

    Google Scholar 

  • Lazaraton, A. (1996a). A qualitative approach to monitoring examiner conduct in the Cambridge assessment of spoken English (CASE). In M. Milanovic & N. Saville (Eds.), Performance testing, cognition and assessment: Selected papers from the 15th Language Testing Research Colloquium (LRTC), Cambridge and Arnhem (pp. 18–33). Cambridge: Cambridge University Press.

    Google Scholar 

  • Lazaraton, A. (1996b). Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing, 13, 151–172.

    Article  Google Scholar 

  • Lazaraton, A. (2002). A qualitative approach to the validation of oral language tests. Cambridge: Cambridge University Press.

    Google Scholar 

  • Lumley, T. (1998). Perceptions of language-trained raters and occupational experts in a test of occupational English language proficiency. English for Specific Purposes, 17, 347–367.

    Article  Google Scholar 

  • Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12, 54–71.

    Article  Google Scholar 

  • McNamara, T. F. (1996). Measuring second language performance. London/New York: Longman.

    Google Scholar 

  • Milanovic, M., Saville, N., & Shuhong, S. (1996). A study of the decision-making behaviour of composition markers. In M. Milanovic & N. Saville (Eds.), Performance testing, cognition and assessment: Selected papers from the 15th Language Testing Research Colloquium (LRTC), Cambridge and Arnhem (pp. 92–111). Cambridge: Cambridge University Press.

    Google Scholar 

  • Morgan, C. (1996). The teacher as examiner: The case of mathematics coursework. Assessment in Education: Principles, Policy & Practice, 3, 353–375.

    Article  Google Scholar 

  • Newton, P. E., & Meadows, M. (2011). Marking quality within test and examination systems. Assessment in Education: Principles, Policy & Practice, 18, 213–216.

    Article  Google Scholar 

  • Orr, M. (2002). The FCE speaking test: Using rater reports to help interpret test scores. System, 30, 143–154.

    Article  Google Scholar 

  • Pajares, M. F. (1992). Teachers’ beliefs and educational research: Cleaning up a messy construct. Review of Educational Research, 62, 307–332.

    Article  Google Scholar 

  • Pollitt, A., & Murray, N. L. (1996). What raters really pay attention to. In M. Milanovic & N. Saville (Eds.), Performance testing, cognition and assessment: Selected papers from the 15th Language Testing Research Colloquium (LRTC), Cambridge and Arnhem (pp. 74–91). Cambridge: Cambridge University Press.

    Google Scholar 

  • Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory Into Practice, 48, 4–11.

    Article  Google Scholar 

  • Richards, J. C., & Schmidt, R. (2002). Dictionary of language teaching and applied linguistics (3rd ed.). Harlow: Longman.

    Google Scholar 

  • Sato, T. (2012). The contribution of test-takers’ speech content to scores on an English oral proficiency test. Language Testing, 29, 223–241.

    Article  Google Scholar 

  • Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the role of interpretation in assessment and in teacher learning. Language Testing, 30, 309–327.

    Article  Google Scholar 

  • Van Lier, L. (1989). Classroom research in second language acquisition. Annual Review of Applied Linguistics, 10, 173–186.

    Article  Google Scholar 

  • Wigglesworth, G. (1994). Patterns of rater behaviour in the assessment of an oral interaction test. Australian Review of Applied Linguistics, 17, 77–103.

    Article  Google Scholar 

  • Winke, P., Gass, S., & Myford, C. (2013). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30, 231–252.

    Article  Google Scholar 

  • Wyatt-Smith, C., & Klenowski, V. (2013). Explicit, latent and meta-criteria: Types of criteria at play in professional judgement practice. Assessment in Education: Principles, Policy & Practice, 20, 35–52.

    Article  Google Scholar 

  • Xi, X. (2007). Evaluating analytic scoring for the TOEFL [R] Academic Speaking Test (TAST) for operational use. Language Testing, 24, 251–286.

    Article  Google Scholar 

  • Yan, X. (2014). An examination of rater performance on a local oral English proficiency test: A mixed-methods approach. Language Testing, 31, 501–527.

    Article  Google Scholar 

  • Yung, B. H. W. (2001). Examiner, policeman or students’ companion: Teachers’ perceptions of their role in an assessment reform. Educational Review, 53, 251–260.

    Article  Google Scholar 

  • Zhang, Y., & Elder, C. (2014). Investigating native and non-native English-speaking teacher raters’ judgements of oral proficiency in the College English Test-Spoken English Test (CET-SET). Assessment in Education: Principles, Policy & Practice, 21, 306–325.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christine C. M. Goh .

Editor information

Editors and Affiliations

Appendix: Teacher Beliefs About Oral Proficiency (TeBOP)

Appendix: Teacher Beliefs About Oral Proficiency (TeBOP)

Instruction

Use the following scale (numbers 1 through 4) to describe what you think about each of the statements below. For each statement, circle the number that gives the best description of what you believe to be true.

figure a

I consider the features listed below to be important when I assess a candidate’s oral proficiency in oral interview/conversation tasks.

Phonology

1. Stress

1

2

3

4

2. Rhythm

1

2

3

4

3. Intonation

1

2

3

4

4. Pronunciation

1

2

3

4

Language

5. Grammar

1

2

3

4

6. Vocabulary

1

2

3

4

7. Use of standard English

1

2

3

4

8. Uses a range of sentence structures correctly

1

2

3

4

Fluency

9. Hesitation

1

2

3

4

10. Repetition

1

2

3

4

11. Restructuring sentences

1

2

3

4

12. Reselecting vocabulary

1

2

3

4

Communication strategies

13. Achievement strategies to (paraphrase, circumlocution, etc.,)

1

2

3

4

14. Interaction strategies (clarification, ask for repetition, etc.)

1

2

3

4

15. Avoidance strategy (avoid unfamiliar topics)

1

2

3

4

Topical knowledge

16. Has interesting ideas

1

2

3

4

17. Elaborates ideas

1

2

3

4

18. Expresses ideas clearly

1

2

3

4

19. Gives a relevant personal response

1

2

3

4

20. Displays maturity in ideas

1

2

3

4

21. Displays breadth of knowledge

1

2

3

4

22. Displays depth of knowledge

1

2

3

4

23. Uses a range of relevant vocabulary

1

2

3

4

Discourse

24. Expresses ideas cohesively and coherently

1

2

3

4

25. Initiates discussion/conversation with the examiner

1

2

3

4

26. Concludes discussion/conversation

1

2

3

4

Personal characteristics

27. Interacts easily with the examiner

1

2

3

4

28. Enthusiastic about what he/she says

1

2

3

4

29. Responds enthusiastically to prompts

1

2

3

4

30. Shows effort

1

2

3

4

31. Good grooming

1

2

3

4

32. Confident

1

2

3

4

33. A pleasant voice

1

2

3

4

If you can, please explain the reasons for your choices in each of the categories:

Phonology

Accuracy

Fluency

Communication strategies

Topical knowledge

Discourse management

Personal characteristics

One or two other features, if any, you would like to add.

34.

1

2

3

4

35.

    

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Goh, C.C.M., Ang-Aw, H.T. (2018). Teacher-Examiners’ Explicit and Enacted Beliefs About Proficiency Indicators in National Oral Assessments. In: Xerri, D., Vella Briffa, P. (eds) Teacher Involvement in High-Stakes Language Testing. Springer, Cham. https://doi.org/10.1007/978-3-319-77177-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77177-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77175-5

  • Online ISBN: 978-3-319-77177-9

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics