SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W52399290> ?p ?o ?g. }

Showing items 1 to 80 of 80 with 100 items per page.

W52399290 abstract "Substantial sums of money are invested by governments in state, national and international testing programs. Australia in particular engages at all three levels. There are number of purposes served by these programs. One of these is to report student performance against standards.Standard setting exercises with respect to a particular assessment are commonly used by testing programs where there is a requirement to determine the point at which it can be said that students have demonstrated achievement of a standard. Several methodologies have been devised that use expert judgements to derive a numerical cut-score on an achievement scale. A commonly used standard setting methodology is one proposed by Angoff (1971).The kernel of the Angoff procedure is the independent judgement of the probability that a minimally competent person can or cannot answer a dichotomously scored item correctly. This methodology typically involves three stages: orientation and training, a first round of performance estimation followed by feedback, and then a second round of performance estimation. In the orientation session, judges are asked to define a hypothetical target group. This definition is dependent upon the judges' tacit understanding of the standard. For example, in the context of a mathematics test, judges would be asked to agree the skills the students should be expected to have mastered. Then they would be asked to envisage a student with those skills and to estimate the proportion of a hypothetical group of equally competent students (as defined by the expected standard) who would answer each item correctly. This proportion is the estimate of the required probability. Then the sum of these probabilities is taken as the raw cut-score on a test composed of the items.Several studies, however, question the validity of the Angoff methodology because of the finding that judges were unable to perform the fundamental task required of them: to estimate the probability a student would answer an item correctly, (Shepard, 1995) even for groups of students who are well known to them (Impara and Blake, 1996). In addition, standard-setting exercises invariably take place in situations where the reporting of educational standards has a high profile and is of political importance. To address the accountability requirements that accompany such a task, a wide range of stakeholders are invited to act as judges in the exercises. Inevitably, however, variability between the judges conception of the standard, as represented by the cut-score set by each of them, causes concern. Can the public have confidence in the standard set if the judges themselves cannot agree? Several studies report the introduction of further rounds of performance estimation and more refined feedback in an attempt to obtain greater consistency between the judges' ratings (Impara and Blake, 2000; McGinty and Neel, 1996; Reckase, 2000).In more recent studies Green, Trimble and Lewis (2003) report a study in which three standard setting procedures were implemented to set cut-scores and which required judges to synthesise the results to establish final cut-points. Green et al report studies such as Impara and Blake (2000) where convergence of results among multiple standard settings are used as evidence of validity of cut-scores, but note that while convergence may occur to a reasonable degree when variations of the same method are used, there are few reports of convergence when different procedures are used.The distinguishing factor between the standard-setting exercises reported in the literature, which rely on judges' tacit understanding of the standard and this study, is the existence of an explicitly and operationally defined standard. In 1996 the Australian Ministers for Education agreed to a national framework for reporting of student achievement in literacy and numeracy and arising from this decision was the drafting of benchmark standards against which the achievement of students in years 3, 5, 7 and 9 could be reported. The benchmark standards are articulated in two components. Criteria describe the skills that students need to have acquired if it is to be said that they achieved the standard and sample work exemplify these criteria. The setting of standards independently of placing them on a scale permitted a more rigorous assessment of the effects of different designs on the setting of cut-scores. Two different standard-setting methodologies have been employed in this study to translate descriptions of the standards into cut-scores. One draws on the Angoff method and involves the use of a rating scale. Judges consider the items of a test and indicate the probability that a student at the cut-score will answer each item correctly. The probabilities are in increments of 0.10, ranging from 0.0 to 1.0. The sum of the probabilities that a judge gives to the items is taken as the raw score cut-score from that judge. The second study involves a method of pairwise comparison of the same items together with items that are operationalised to be benchmark items. The judge has to decide which of each pair of items is the more difficult. The results of the two benchmark setting designs appear to support findings from other standard-setting exercises reported in the literature. Namely,i.Judges were unable to estimate absolute item difficulty for a student of prescribed ability.ii.Where two different designs were used, there is no convergence in results.iii.Ratings from different judges within each design varied widely.To indicate the resultant discrepancy in setting the benchmark on the same test, the rating methodology gives a value of 16.08 and the pairwise a value of 7.10 on ostensibly the same scale. A closer examination of the judges' ratings, however, suggests that despite the evidence of dramatically different cut scores between the two exercises, the judges were highly consistent in their interpretation of relative item difficulty. Two lines of evidence indicate this high level of internal consistency: (i) the reliability index for the pairwise data; and (ii) the correlation between the item estimates obtained from the rating and pairwise exercises, which was 0.95. In addition, the correlation of the relative item difficulties with those obtained from students responding to the same items was a satisfactory 0.80 and 0.74 for the ratings and for the pairwise designs, respectively.The high correlation between judgements across the two exercises, in conjunction with the relatively high correlation of the item difficulties from the judges' data and from the student data, suggests that problems observed in the literature do not arise because judges cannot differentiate the relative difficulties of the items. Accordingly, the unit of scale as assessed by the standard deviations of the item difficulties were calculated and examined. The standard deviation of the items from judges in the likelihood design was half that of the item difficulties from the student responses, and the standard deviation of the items from the pairwise design was over twice that of the student scale. The substantial difference between the standard deviations suggests a difference between the units of scale, which presents a fundamental problem for common equating. In general, and in the literature, it seems that the unit of scale as evidenced from the standard deviations is not considered and it seems that it is simply assumed that the unit of scale produced by the students and the judges is the same and each design should be the same. Then if the results of different modes of the data collections do not arrive at the same or very similar cut-scores, it is not considered that this might be only a result of different units of scale.In retrospect, it is not surprising that different formats for data collection produce different units of scale, and that different cut-scores result. In addition, it is not surprising that these might also produce a different unit of scale from that produced by the responses of the students. The reasons that the different designs are likely to produce different units of scale are considered in the thesis. Differences in the unit of scale will inevitably have an impact on the location of the benchmark or cut-score. When the difference in standard deviation is accounted for, and the cut-scores are placed on the same scale as that produced by the students, the two exercises provide similar locations of the benchmark cut-score. Importantly, the thesis shows that these locations can be substantiated qualitatively as representing the defined standard. There are two main conclusions of the study. First, some of the problems reported in the literature in setting benchmarks can be attributed to difference in the units of scale in the various response formats of judges relative to those of students. Second, this difference in unit of scale needs to be taken into account when locating the standard on the student scale. This thesis describes in detail the two cut-score setting designs for the data collection, and the transformations that are necessary in order to locate the benchmark on the same scale as that produced by the responses of the students." @default.
W52399290 created "2016-06-24" @default.
W52399290 creator A5011848357 @default.
W52399290 date "2008-07-27" @default.
W52399290 modified "2023-09-23" @default.
W52399290 title "Accounting for Unit of Scale in Standard Setting Methodologies" @default.
W52399290 cites W134018459 @default.
W52399290 cites W1968961518 @default.
W52399290 cites W1970133260 @default.
W52399290 cites W1970931621 @default.
W52399290 cites W1976172684 @default.
W52399290 cites W1980607710 @default.
W52399290 cites W1983678109 @default.
W52399290 cites W1996829812 @default.
W52399290 cites W2007667963 @default.
W52399290 cites W2028254282 @default.
W52399290 cites W2037617768 @default.
W52399290 cites W2051842130 @default.
W52399290 cites W2085451502 @default.
W52399290 cites W2107229891 @default.
W52399290 cites W2116983997 @default.
W52399290 cites W2122010535 @default.
W52399290 cites W2127374659 @default.
W52399290 cites W2132483084 @default.
W52399290 cites W2163226076 @default.
W52399290 cites W2563024385 @default.
W52399290 cites W2727708856 @default.
W52399290 cites W2763110165 @default.
W52399290 cites W27682185 @default.
W52399290 cites W301892227 @default.
W52399290 cites W35090256 @default.
W52399290 hasPublicationYear "2008" @default.
W52399290 type Work @default.
W52399290 sameAs 52399290 @default.
W52399290 citedByCount "0" @default.
W52399290 crossrefType "book" @default.
W52399290 hasAuthorship W52399290A5011848357 @default.
W52399290 hasConcept C121955636 @default.
W52399290 hasConcept C122637931 @default.
W52399290 hasConcept C144133560 @default.
W52399290 hasConcept C145420912 @default.
W52399290 hasConcept C205649164 @default.
W52399290 hasConcept C2778755073 @default.
W52399290 hasConcept C33923547 @default.
W52399290 hasConcept C58640448 @default.
W52399290 hasConceptScore W52399290C121955636 @default.
W52399290 hasConceptScore W52399290C122637931 @default.
W52399290 hasConceptScore W52399290C144133560 @default.
W52399290 hasConceptScore W52399290C145420912 @default.
W52399290 hasConceptScore W52399290C205649164 @default.
W52399290 hasConceptScore W52399290C2778755073 @default.
W52399290 hasConceptScore W52399290C33923547 @default.
W52399290 hasConceptScore W52399290C58640448 @default.
W52399290 hasLocation W523992901 @default.
W52399290 hasOpenAccess W52399290 @default.
W52399290 hasPrimaryLocation W523992901 @default.
W52399290 hasRelatedWork W1973784071 @default.
W52399290 hasRelatedWork W1999622336 @default.
W52399290 hasRelatedWork W2006235203 @default.
W52399290 hasRelatedWork W2018597994 @default.
W52399290 hasRelatedWork W2057178796 @default.
W52399290 hasRelatedWork W2071594253 @default.
W52399290 hasRelatedWork W2088499697 @default.
W52399290 hasRelatedWork W2095508883 @default.
W52399290 hasRelatedWork W2316903821 @default.
W52399290 hasRelatedWork W2318427134 @default.
W52399290 hasRelatedWork W2402553922 @default.
W52399290 hasRelatedWork W244263544 @default.
W52399290 hasRelatedWork W2483080699 @default.
W52399290 hasRelatedWork W2980636171 @default.
W52399290 hasRelatedWork W3124904075 @default.
W52399290 hasRelatedWork W3126055109 @default.
W52399290 hasRelatedWork W340278688 @default.
W52399290 hasRelatedWork W88582300 @default.
W52399290 hasRelatedWork W2114505605 @default.
W52399290 hasRelatedWork W2599984322 @default.
W52399290 isParatext "false" @default.
W52399290 isRetracted "false" @default.
W52399290 magId "52399290" @default.
W52399290 workType "book" @default.