School of Languages and Linguistics - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    The process of the assessment of writing performance: the rater's perspective
    Lumley, Thomas James Nathaniel ( 2000)
    The primary purpose of this study is to investigate the process by which raters of texts written by ESL learners make their scoring decisions. The context is the Special Test of English Proficiency (step), used by the Australian government to assist in immigration decisions. Four trained, experienced and reliable step raters took part in the study, providing scores for two sets of 24 texts. The first set was scored as in an operational rating session. Raters then provided think-aloud protocols describing the rating process as they rated the second set. Scores were compared under the two conditions and comparisons made with the raters' operational rating behaviour. Both similarities and differences were observed. A coding scheme developed to describe the think-aloud data allowed analysis of the sequence of rating, the interpretations the raters made of the scoring categories in the analytic rating scale, and the difficulties raters faced in rating. Findings demonstrate that raters follow a fundamentally similar rating process, in three stages. With some exceptions, they appear to hold similar interpretations of the scale categories and descriptors, but the relationship between scale contents and text quality remains obscure. A model is presented describing the rating process. This shows that rating is at one level a rule-bound, socially governed procedure that relies upon a rating scale and the rater training which supports it, but it retains an indeterminate component as a result of the complexity of raters' reactions to individual texts. The task raters face is to reconcile their impression of the text, the specific features of the text, and the wordings of the rating scale, thereby producing a set of scores. The rules and the scale do not cover all eventualities, forcing the raters to develop various strategies to help them cope with problematic aspects of the rating process. In doing this they try to remain close to the scale, but are also heavily influenced by the complex intuitive impression of the text obtained when they first read it. This sets up a tension between the rules and the intuitive impression, which raters resolve by what is ultimately a somewhat indeterminate process. In spite of this tension and indeterminacy, rating can succeed in yielding consistent scores provided raters are supported by adequate training, with additional guidelines to assist them in dealing with problems. Rating requires such constraining procedures to produce reliable measurement.
  • Item
    Thumbnail Image
    The importance and effectiveness of moderation training on the reliability of teacher assessments of ESL writing samples
    McIntyre, Philip N. ( 1993)
    This thesis reports the findings of a study of the inter-rater reliability of assessment of ESL Writing by teachers in the Australian Adult Migrant Education Program, using the ASLPR, a language proficiency scale used throughout the program. The study investigates the individual ratings assigned to 15 writing samples by 83 teachers, both before and after training aimed at moderation of raters' perceptions of descriptors in the scale by reference to features of other 'anchor' writing samples. The thesis argues the necessity for on-going training of assessors of ESL writing, at a time of change in the program, from assessment of language proficiency to that of language competencies, since both forms of assessments are increasingly having consequences which affect the lives of the candidates. The importance and necessity for moderation training is established by reference to the problems of validity in the scale itself and in its use in the program, and by reference to the literature of assessor-training and features of writing which influence rater-judgements. The findings indicate that training is effective in substantially increasing inter-rater reliability of the subjects, by reducing the range of levels assigned to the samples and increasing the percentages of ratings at the mode (most accurate) level and at the Mode +/- 1 level (an allowance for 'error' due to the subjective nature of the assessment), after training. The paper concludes that on-going training is effective in achieving greater consensus i.e. inter-rater reliability amongst the assessors, but suggests that variability needs to be further reduced and offers suggestions for further research aimed at other assessors and variables.