School of Languages and Linguistics - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 5 of 5
  • Item
    Thumbnail Image
    Applying the context-adaptive model: evaluating a DEET funded English Language Program
    Ducasse, Ana Maria ( 1995)
    Financially able governments around the world are embarking on major projects to retrain the growing numbers of unemployed. Education systems now dominated by 'market economy' -thinking government bodies holding the reigns on policy making and funding. It would appear from the writing of Bell and Goldstein (1995:21) that the situation in Australia is parallel to that of Canada. It is summarised in the words: "Many workers who have permanently lost their jobs in t:llls current economic recession have been advised to upgrade their educational credentials and obtain new work skills. In these changing economic times, upgrading, training and 'lifelong learning' are seen by many to be the key to finding and keeping a good job." This statement could easily be made about Australia, where the Federal Government is funding many types of training programs for the unemployed. The one being evaluated here is an English as Second Language (ESL) program funded by the Department of Employment Education and Training (DEET) for retrenched workers from the Textile Clothing and Footwear (TCF) industry. The program to be discussed is located at Victoria College, a registered private provider of education and training in Melbourne. Initially, the college offered English Language Intensive Courses for Overseas Students (ELICOS) accredited by an industry body, the National ELICOS Accreditation Scheme (NEAS). It has now broadened its scope to offer business and DEET funded courses. The evaluator has been closely connected to the program in the capacity of teacher, coordinator and (DEET) liaison officer. The first chapter presents the historical background of language program evaluations. Reports on outcomes from closely related areas, are presented next, as relevant background literature. The model chosen for the framework of this evaluation is the Context-Adaptive Model (CAM) (Lynch 1990). The second chapter leads to an evaluation design by adapting steps of the model to the evaluation context. It takes into consideration "such issues as the social and political basis and motivation for the language learning and teaching" (Lynch In press 94 13) which are important background to the evaluation. The data collection design is presented in the third chapter with the thematic framework for the evaluation. The design has quantitative and qualitative data collected for separate audience goals. The fourth chapter shows how qualitative and quantitative data is collected from various sources. The qualitative data consists of post-course questionnaires; case studies and interviews. Quantitative data consists of Australian Second Language Proficiency Rating (ASLPR) results in the form of precourse and post-course proficiency ratings for all the students and as well as a two-year charting of the four macro-skills for the case studies. In the fifth chapter, the results are discussed and arguments for the validation of the data and methods are put forward in the sixth chapter. The evaluation conclusions can then be drawn from the different perspectives presented in the last chapter.
  • Item
    Thumbnail Image
    The process of the assessment of writing performance: the rater's perspective
    Lumley, Thomas James Nathaniel ( 2000)
    The primary purpose of this study is to investigate the process by which raters of texts written by ESL learners make their scoring decisions. The context is the Special Test of English Proficiency (step), used by the Australian government to assist in immigration decisions. Four trained, experienced and reliable step raters took part in the study, providing scores for two sets of 24 texts. The first set was scored as in an operational rating session. Raters then provided think-aloud protocols describing the rating process as they rated the second set. Scores were compared under the two conditions and comparisons made with the raters' operational rating behaviour. Both similarities and differences were observed. A coding scheme developed to describe the think-aloud data allowed analysis of the sequence of rating, the interpretations the raters made of the scoring categories in the analytic rating scale, and the difficulties raters faced in rating. Findings demonstrate that raters follow a fundamentally similar rating process, in three stages. With some exceptions, they appear to hold similar interpretations of the scale categories and descriptors, but the relationship between scale contents and text quality remains obscure. A model is presented describing the rating process. This shows that rating is at one level a rule-bound, socially governed procedure that relies upon a rating scale and the rater training which supports it, but it retains an indeterminate component as a result of the complexity of raters' reactions to individual texts. The task raters face is to reconcile their impression of the text, the specific features of the text, and the wordings of the rating scale, thereby producing a set of scores. The rules and the scale do not cover all eventualities, forcing the raters to develop various strategies to help them cope with problematic aspects of the rating process. In doing this they try to remain close to the scale, but are also heavily influenced by the complex intuitive impression of the text obtained when they first read it. This sets up a tension between the rules and the intuitive impression, which raters resolve by what is ultimately a somewhat indeterminate process. In spite of this tension and indeterminacy, rating can succeed in yielding consistent scores provided raters are supported by adequate training, with additional guidelines to assist them in dealing with problems. Rating requires such constraining procedures to produce reliable measurement.
  • Item
    Thumbnail Image
    Introducing EFL speaking tests into a Japanese senior high school entrance examination
    Akiyama, Tomoyasu ( 2004)
    This thesis investigates the feasibility of introducing speaking tests into the existing English test of the senior high school entrance examination in Japan by employing Messick's (1989) validity framework. The study demostrates that validity investigations need to include not only psychometric analysis but also a consideration of the competing values of stakeholders. The teaching guidelines for English issued by the Japanese Ministry of Education (1998) state that speaking is one of the most important skills for junior high school students. An entry decision to senior high school is based on both school-based assessment implemented by junior high school teachers and the existing external standardized English test. Despite the emphasis on the development of speaking skills, the existing English test does not include the assessment of speaking skills. There is a clear discrepancy between the aims of the guidelines and the skills tested in the entrance examination. A way to bridge this gap could be to introduce speaking tests into the English test of senior high school entrance examination, a step that would necessitate considering the validity of such test. The major issue of test validity relates to the meaning, relevance and utility of test scores as well as the value implications of test scores and the social consequences of test use (e.g. Messick, 1989; Bachman, 1990; McNamara, 2001). A questionnaire survey of teachers and students, and interviews with government officials and academics responsible for the test, were used to ascertain stakeholders' attitudes towards the introduction of speaking tests and their view of possible washback effects on the teaching or English (Study I). In order to respond concerns expressed by stakeholders in Study 1 about the reliability, validity and practicality of tests assessing oral skills, a possible oral skills component in the existing test was developed, and trialled and test scores were analysed, focusing on the practicality of the administration and psychometric adequacy of investigating student ability, raters, tasks and items via Rasch measurement (Study 2). Study 1 revealed that while most stakeholders were positive about the introduction of speaking tests, two stakeholders groups—the Education Board and senior high school teachers were not. The former, the test developers, took a conservative approach in wanting to maintain the status quo, and the latter, the test administrators, were resistant to the introduction of speaking tests for complex reasons, both internal and external. The views held by these two stakeholder groups are major obstacles to introducing such a test. Preliminary findings from Study 2 showed that the speaking tests developed were psychometrically adequate to measure junior high school students' oral skills. This study demonstrates that careful consideration needs to be given to the possible psychological fear aroused in stakeholders by the changes that would occur if speaking tests were included in the senior high school entrance examination. These changes can also challenge the values that underpin the existing educational system, both at the institutional and individual level, with different groups of stakeholders holding competing values. Clearly, taking these values into account is important in investigating the feasibility of introducing speaking tests into the entrance examinations in that any future component oral skills component in the entrance examination challenge the existing examination embodying ideological, political, and educational values - for as Messick argues, validity needs to be viewed in terms that go beyond psychometric rigour. The thesis concludes with a discussion on the implications for validity theory and the development of language assessment policy.
  • Item
    Thumbnail Image
    An investigation into the validity of two EFL (English as a Foreign Language) listening tests: IELTS and TOEFL iBT
    Nguyen, Thi Nhan Hoa ( 2008)
    This study is an investigation of the construct validity of two EFL listening tests: IELTS and TOEFL iBT. It aimed to answer the question: "How do IELTS and TOEFL iBT listening tests compare in terms of test construct?" (For complete abstract open document.)
  • Item
    Thumbnail Image
    Assessing the second language proficiency of health professionals
    MCNAMARA, TIMOTHY FRANCIS ( 1990)
    This thesis reports on the development of an Australian Government English as a Second Language test for health professionals, the Occupational English Test (OET) , and its validation using Rasch Item Response Theory models. The test contains sub-tests of the four macroskills, each based on workplace communication tasks. The thesis reports on the creation of test specifications, the trial ling of test materials and the analysis of data from full test sessions. The main research issues dealt with are as follows: 1. The nature of the constructs involved in communicative language testing. The term proficiency is analysed, and its relationship to a number of models of communicative competence examined. The difficulty of incorporating into these models factors underlying test performance is identified. 2. The nature of performance tests. A distinction is introduced between strong and weak senses of the term performance test, and related to the discussion in 1 above. 3. The content validity of the OET. This is established on the basis of a questionnaire survey, interviews, examination of relevant literature, workplace observation and test data. 4. The role of classical and Rasch IRT analysis in establishing the qualities of the test. Classical and Rasch IRT analyses are used to establish the basic reliability of the OET sub-tests. The Writing sub-test is shown to be somewhat problematic for raters because of the nature of the writing task involved. Analysis of data from the Reading subtest demonstrates the superiority of the Rasch analysis in the creation of short tests with a specific screening function. 5. The role of Rasch IRT analysis in investigating the construct and content validity of the test and hence of communicatively-oriented tests in general. Rasch analysis reveals that the sub-tests are satisfactory operationalizations of the constructs 'ESL listening/ speaking/ reading/ writing ability in health professional contexts. For the Speaking and Writing sub-tests, the analysis reveals that responses of raters in categories associated with perceptions of grammatical accuracy have a more important role in the determination of the candidate's total score than was anticipated in the design of the test. This finding has implications for the validity of communicatively oriented tests in general, and illustrates the potential of IRT analysis for the investigation of the construct validity of tests. 6. The appropriateness of the use of Rasch IRT in the analysis of language tests. The nature of the debate about 'unidimensionality' in Rasch analysis is reviewed. It is argued that the issue has been substantialy misunderstood. Data from the two parts of the Listening sub-test are analysed, and statistical tests are used to confirm the unidimensionality of the data set. It is concluded that Rasch analysis is appropriate for a language test of this type. 7. The behaviour of raters in the rating of oral and written production in a second language. The findings reported in 5 above suggest that the behaviour of raters is crucial to understanding what is being measured in a communicative test of the productive language skills. The research demonstrates the value of Rasch IRT analysis in the empirical validation of communicatively oriented language tests, and the potential of large-scale test development projects for theoretical work on language testing.