Ministry of higher and secondary special educaion of the republic of uzbekistan

Chapter II Types of assessing writng skills

Yüklə 64,7 Kb.

səhifə	5/10
tarix	12.02.2023
ölçüsü	64,7 Kb.
	#83951

1 2 3 4 5 6 7 8 9 10

Assessing writng

Chapter II Types of assessing writng skills.
Assessing learners’ writng proficiency by direct measurment type
All the issues raised thus far regarding direct and indirect measures bear on one overriding question: Whatis the best wayto assess writingability--through essay tests, multiple-choicetests, orsome combinationof methods? The answer to this question involves psychometric, pedagogical, and practical concerns.
Direct measure: The use of a direct measureas the sole test of writing ability may appeal to English teachers, but it may also prove inadequate, especially in a large testing program. Essayexaminations, for one thing, simply are not reliable enough to serve alone for makingcrucial decisions about individuals. Apopulation like that taking the GRE General Test could pose even bigger reliability problems. Coffman(1971) writes,"The level (of reliability) willtend to be lower if the examination is administered to a large group withheterogeneous backgrounds rather than to small group whose background of instruction is homogeneous. It willtend to be lower if manydifferentraters have to be trained to apply a common standard of evaluation in reading the papers. It willtend to be lower for less exact and more discursivesubject matters."especially for a "discipline-free" or"general composition topic" (pp. 278-279). And as Coffmanobserves. attaining a high reading reliability through multiple readings"does not insure that score r e liabUi ty wi 11also be high.... It does not do much good to achieve high rating reliability (usually at considerable expense). for example. if the sample is so inadequate that the test [score) reliability remains low." Hereminds us that the rating reliability of a single multiple-choiceitem "is likely to be in excess of. 99, yet nobody wouldconfuse this coefficient withthe reliability of atest made upof two or three such items" (pp, 281. 297).
Coffman(1966) also remarks that "placing undue emphasis on reliability whenevaluating essay tests maylead to unwarranted conclusions. If essay tests are to be compared withobjective tests. it is the correlations of each witha common reliable criterionmeasure,not their respective reliability coefficients, whichshould be compared" (p , 156). But by this standard too. essay tests used alone are found wanting. As God shalket ale (1966) report, "For the essays. the first-order validities do not approach the range of validities for other [i.e•• the multiple-choice] types of questions until the score is based on three readings" (p , 41). In a field trial follow-up to the original study. the authors determined that even when essays received four readings, the average correlation of an essay's total score withthe criterion measurewassubstantially lower than that of the sentence correction or~sagescore (p. 34).
These findings confirm "the common opinion that a single topic is inadequate" (McColly. 1970. p. 152) on grounds of validityand reliability for assessing an individual's writingability. Kincaid established as early as 1953 that "a single paper writtenby a student on a given topic at a particulartime cannot be considered as a valid basis for evaluating hisachievement," especially if the student is above average
(cited in Braddocket al, , 1963, p , 92). Lee Odell(1981) argues, "The ability to do one sort of writingtask maynot imply equal abilitywith otherkinds of tasks•••• We need to evaluate several differentkinds of writing performance." unless only one kind (for example. the persuasive essay) is of interest. Moreover."we musthave at least two samples of the student's writingfor each kind of writing"(pp, 115. 118). And Paul Diederichhas maintainedthat two themes are "totally inadequate" (cited in Braddocket al•• 1963, p. 7n). They certainly appear so unless each is given numerous readings.
Researchby Coffman(1966) has compared the importance ofmultiple topics to that of multiple readings. The valueof reading reliability (r ), naturally, depends more onthe number of readings per topic than o~athenumber of topics read. For example, r =.701 for five topics read once each and .759 for one topic readafive times. But values for validity (r 1) and score reliability (r 1) depend more onthe number of topics read thancon the number of rea~ingsper topic. Since the validityof an examinee's score is clearly moreimportant than its reading reliability, it wouldseem necessary for a validdirect assessment to require several essays from each student.
Note,however. that values for all figures tenn to be highestwhen R (number of readings per topic) times T(number of topics) is maximumfor any number n wheren-R+T. For example. two topics read twice each yield higher figures than either one topic read three times or three topics read once each. Coffman(1966) considers the validitycoefficients reported by Godshalket a1. (1966) for the one-hour objective composite of subtests and concludes. "In order to obtainvaliditycoefficients of comparable magnitudeusing only essay tests. it wouldbe necessary to assign at least two topics to each student and have each read by five different readers or to assign three topics and have each read by three different readers" (p. 156). Large-scaleassessments rarely find it possible to give morethan one topic withtwo independent readings.

Yüklə 64,7 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 10