Minerva Elements Records

Permanent URI for this collection

http://hdl.handle.net/11343/251382

Search Results

Now showing 1 - 10 of 12

Actual Interpretations and Use of Scores as Aspects of Validity

O'Leary, TM ; Hattie, JAC ; Griffin, P (WILEY, 2017-06-01)

Validity is the most fundamental consideration in test development. Understandably, much time, effort, and money is spent in its pursuit. Central to the modern conception of validity are the interpretations made, and uses planned, on the basis of test scores. There is, unfortunately, however, evidence that test users have difficulty understanding scores as intended. That is, although the proposed interpretations and use of test scores might be theoretically valid they might never come to be because the meaning of the message is lost in translation. This necessitates pause. It is almost absurd to think that the intended interpretations and uses of test scores might fail because there is a lack of alignment with the actual interpretations made and uses enacted by the audience. Despite this, there has only recently been contributions to the literature regarding the interpretability of score reports, the mechanisms by which scores are communicated to their audience, and their relevance to validity. These contributions have focused upon linking, through evidence, the intended interpretation and use with the actual interpretations being made and actions being planned by score users. This article reviews the current conception of validity, validation, and validity evidence with the goal of positioning the emerging notion of validity of usage within the current paradigm.
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program

Zoanetti, N ; Beaves, M ; Griffin, P ; Wallace, EM (BMC, 2013-03-04)

BACKGROUND: Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. METHODS: The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. RESULTS: Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. CONCLUSIONS: The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Measuring Collaborative Problem Solving Using Mathematics-Based Tasks

Harding, S-ME ; Griffin, P ; Awwal, N ; Alom, M ; Scoular, C (American Educational Research Association, 2017-07)

This study describes an online method of measuring individual students’ collaborative problem-solving abilities using four interactive mathematics-based tasks, with students working in pairs. Process stream data were captured from 3,000 students who completed the tasks in the United States, Australia, Canada, Costa Rica, Singapore, and Finland. The data were transformed into indicators of collaborative problem-solving ability and were analyzed using item response modeling. The assessments employed in this study can be used as a teaching tool for introduction to algebraic concepts and as a measurement instrument for collaborative problem-solving ability. The paper describes the construction, calibration, and reliability of the tasks and considers validation issues, such as fairness between assessments for both partners and avoidance of cultural biases. Investigations into the dependencies between student scores provide evidence for convergent and discriminant validity.
Developmental assessment: lifting literacy through professional learning teams

Griffin, P ; Murray, L ; Care, E ; Thomas, A ; Perri, P (ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD, 2010)
Rasch scaling procedures for informing development of a valid Fetal Surveillance Education Program multiple-choice assessment

Zoanetti, N ; Griffin, P ; Beaves, M ; Wallace, EM (BMC, 2009-04-29)

BACKGROUND: It is widely recognised that deficiencies in fetal surveillance practice continue to contribute significantly to the burden of adverse outcomes. This has prompted the development of evidence-based clinical practice guidelines by the Royal Australian and New Zealand College of Obstetricians and Gynaecologists and an associated Fetal Surveillance Education Program to deliver the associated learning. This article describes initial steps in the validation of a corresponding multiple-choice assessment of the relevant educational outcomes through a combination of item response modelling and expert judgement. METHODS: The Rasch item response model was employed for item and test analysis and to empirically derive the substantive interpretation of the assessment variable. This interpretation was then compared to the hierarchy of competencies specified a priori by a team of eight subject-matter experts. Classical Test Theory analyses were also conducted. RESULTS: A high level of agreement between the hypothesised and derived variable provided evidence of construct validity. Item and test indices from Rasch analysis and Classical Test Theory analysis suggested that the current test form was of moderate quality. However, the analyses made clear the required steps for establishing a valid assessment of sufficient psychometric quality. These steps included: increasing the number of items from 40 to 50 in the first instance, reviewing ineffective items, targeting new items to specific content and difficulty gaps, and formalising the assessment blueprint in light of empirical information relating item structure to item difficulty. CONCLUSION: The application of the Rasch model for criterion-referenced assessment validation with an expert stakeholder group is herein described. Recommendations for subsequent item and test construction are also outlined in this article.
THE COMFORT OF COMPETENCE AND THE UNCERTAINTY OF ASSESSMENT

Griffin, P (Elsevier BV, 2007-03-01)
Standards-referenced assessment for vocational education and training in schools

Griffin, P ; Gillis, S ; Calvitto, L (AUSTRALIAN COUNCIL EDUCATIONAL RES LIMITED, 2007-04)

This study examined a model of assessment that could be applied nationally for Year Twelve Vocational Education and Training (VET) subjects and which could yield both a differentiating score and recognition of competence. More than fifty colleges across all states and territories of Australia field-tested the approach over one school year. Results showed that the model allowed for a standards-referenced model to be used: that the approach was compatible with the diverse range of senior secondary assessment systems in use throughout Australia and that there were considerable cost benefits to be had in adopting the logic of item response modelling for the development of rubrics for scoring performances on units of competence from National Training Packages. A change in the logic of competency assessment was proposed, in that the performance indicators were not rated using a dichotomy but with a series of quality ordered criteria to indicate how well students performed specified tasks in the workplace or its simulation. The study validated the method of assessment development, demonstrated the method's consistency, and showed how the method could address the issue of consistency across states. The study also proposed a set of principles for a joint assessment of both quality and competence.
Reading achievements of Vietnamese Grade 5 pupils

Griffin, P ; Thi Thanh, M (Informa UK Limited, 2006-01-01)
A 20-year study of mathematics achievement

Griffin, P ; Callingham, R (NATL COUNCIL TEACHERS MATHEMATICS-NCTM, 2006-05)
Developing a measure of wealth for primary student families in a developing country: Comparison of two methods of psychometric calibration

Griffin, P (Elsevier BV, 2005-01)

Minerva Elements Records

Permanent URI for this collection

Filters

Date

Author

Subject

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results