Indirect measure: The essay examination, then. appears unworkable as the sole measureofwritingability in a program that seeks to rank order candidates for selection. Scores based on two readings do not allow for reliable close discriminations but can be usefulwhenconsidered along withother information. Couldindirect assessment alone be adequate for the purpose of orderingand selecting candidates? EdwardNoyesdeclared in 1963 that well-craftedobjectiveEnglishitems "measure howwella student can distinguish between correct and incorrect expression•••and how sensitive he is to appropriate or inappropriate usage"; objectiveEnglish tests are "based on the assumption that these skills are so closely associatedwithactual writingability that directmeasurementof them willprovidea reasonably accurate. if indirect, measurementofwriting ability" (p. 7). Asdiscussed. numerous studies bear out this assumption. In particular,Godshalket al. (1966) reported, "We have demonstrated that the [objective] one-hour English Composition Testdoes an amazingly effective job of orderingstudents in the same way as a trained group of readers wouldafter reading a sizable sample of their actualwriting." Variousmultiple-choiceitem types proved "remarkably effective for predictingscores on a reliable criterion of writingability" (pp. 21.29). The authors added. "one hypothesis of this study was that the multiple-choicequestions were discriminating primarily at the lower levels of skill•••• No convincing support for this hypothesis was found" (p. 42). The objective sections workedvery wellalone and even better whencombined.
From a psychometric point of view. it does appear that indirect assessment alone can afford a satisfactory measureof writingskills for ranking and selection purposes. "any argue persuasively that it is inadequate by itself for such purposes as charting the development of individual student writers.gauging the impact of a new writingprogram gaining a full pictureof a given student's strengths and limitations in variouswritingsituations, orplacinga student in one of several English classes. Duringthe period 1977-1982. ETS'sPrograms for the Assessment of Writingcontracted witha number of colleges to developdirect assessments. Thesecolleges werenot satisfied withtheir record of placement decisions based solely on TSWE scores. RichardStiggins(1981) adds that even for selection. directassessment is to be preferred "when writing proficiency is the sole or primary selection criterion." as for scholarships to a college writingprogram; indirect measures"are acceptable whenever writing proficiency is one of the manyselection criteria" (p, 11). Still.the use of multiple-choicetests alone for any writingassessmentpurpose willelicitstrong objections. Some. such as charges that the tests reveal little or nothing about a candidate's writingability. can readily be handled by citing research studies. The question of how testing affects teaching. though, is less easily answered. Even the strongest proponents ofmultiple-choiceEnglishtests and the severest criticsof essay examinations agree that no concentration on multiple-choice materialscan substitute for "a long and arduous apprenticeship in actual writing"(Palmer. 1966, p , 289). But willsuch apprenticeship be required if. as CharlesCooper(1981) fears. a preoccupationwithstudent performance on tests will"narrow" instruction to "objectives appropriate to the tests" (p. 12)? Lee Odell(1981) adds, "In considering specific procedures for evaluating writing, we must remember that students have a right to expect their coursework to prepare them to do wellin areas wherethey willbe evaluated. Andteachers have an obligation to makesure that students receive this preparation. Thiscombination of expectation and obligationalmost guarantees that evaluation willinfluence teaching procedures and, indeed, the writingcurriculum itself." Consequently, "our procedures for evaluatingwritingmustbe consistent withour best understanding ofwritingand the teaching of writing"( pp. 112-113).
Directassessment generally addresses the skills and processes that Englishteachers agree are the mostimportant aspects of writing, the focal points of instruction. Ironically, indirect assessment could affect the curriculum to its own detriment by providing an accurate measureof what it appears to neglect. Brelandand Jones (1982) found a "mechanical" criterion, essay length, to be one of the very best predictorsof holistic grades. Thisstandard is whollyobjective; it is very easy, quick, and inexpensive to apply; and it wouldyieldextremely reliable ratings. ~l essay reading is subjective. painstaking, and costly; it can take days to finish; and the results are often quite unreliable by the usual testing standards. Yetno responsible person wouldsuggest scoring essays merely on length, even though correlation coefficients wouldconfirm the validity of the scores--at least for a few administrations. Often length is a byproduct of thorough development and is achieved through use of supporting materials. But it indicates development only so long as instruction emphasizes development and not length itself. If the emphasis weremerely on producing length, the qualityof essay writing,and hence the correlation between essay length and holisticgrade, wouldsuffer terribly.
Thereis clearly morejustification for using multiple-choicetests than for grading solely on length, butEnglishinstructors worrythat widespreaduse of objective tests willcause classes to driftaway from the disciplineof writingessays and toward a preoccupationwithrecognizing isolated sentence-level errors. Werethat to happen, the celebrated correlations of indirect measureswithdirect criterionmeasureswould be likely to drop. Perhaps indirect measurescan substitute psychometrically for direct ones, but the practiceof taking multiple-choice examinations cannot substitute for the practiceof writingcompositions. Multiplechoice tests willremain valid and useful only if objective test skills do not become the targets of instruction. An essay component in a test acknowledges the importance of writingin the curriculum and encourages cultivation of the skill that the multiple-choicetest intends to measure.
Combined measures: Both essay and objective tests entail problems when used alone. Would somecombination of directand indirect measuresthen be desirable? From a psychometric point of view, it is sensible to provide both directand indirect measuresonly if the correlation between them is modest--thatis, if each exercise measuressomething distinctand significant. The various research studies reviewed by Stiggins(1981) "suggest that the two approaches assess at least some of the same performance factors, whileat the same time each dealswithsome unique aspects of writingskill•••each provides a slightly different kind of information regarding a student's ability to use standard writtenEnglish" (pp, 1,2). Godshalket a l , (1966) unambiguously state, "The mostefficient predictorof a reliable directmeasureof writingability is one which includes essay questions•••in combination withobjectivequestions" (p. 41). The substitution of a field-trial essay read once for a multiple-choice subtest in the one-hour ECT resulted in a higher multiplecorrelation coefficient in six out of eight cases; whenthe essay score was based on two readings, alleight coefficientswerehigher. The average difference was .022--small but statistically significant (pp. 36-37). Not all multiple-choicesubtests weredeveloped equal, though. In every case, the usage and sentence correction sections each contributed more to the multipleprediction than did essay scores based on one or two readings, and in mostcases each contributed more than scores based on three or four readings (pp. 83-84). Apparentlythe success of the impromptu exercise can be repeated in large-scale assessments. Myerset ale (1966) reported that 145 readers spent five days giving each of the 80,842ECT 20-minute essays two independent holisticreadings. The operational reading proved about as reliable as comparable experimental readings. The brief essay, even when scores are based on only two readings, can make asmall but unique contribution to the measurementof writing skills--something "over and beyond the prediction possible with objective measures alone" (Breland and Jones, 1982, p. 27). Accordingly,directand indirect measuresof writing ability are used together in several testing programs developed and administered by ETS. "In the EnglishCompositionTest,"report Brelandand Jones, "it has been found that the test and score reliability are not significantly.
diminished if 20 minutesof the hour weregiven to a directmeasureusing
an essay question and the remaining 40 minuteswereused for 70 objective
questions" (p. 3). The committee of examiners wasconvinced that "for the
purposes of the ECT (primarily college placement), the 20-minute essay
offered adequate time to sample in a general way" (pp. 3-4) the higher-order
skills.
As discussed above, manycriticsconsider 20 minutesinsufficient for this purpose, but often they are thinking of the directassessment as standing alone. Shortessay and multiple-choicetests can be combined for their mutualadvantage. Forexample, J. W. Frenchwastroubled in his 1961 study (cited in Diederich,French, &Carlton)by the lack of agreement among readers, but he observed betteragreement whenthe readers focused on the higher-orderskills that he termed "ideas," "form," and "flavor." Perhaps essay scores could be more reliable if readers weretrained to concentrate just on those criteriaand leave the assessment ~f lower-order skills to multiple-choicetests, whichare far morethorough and consistent gauges of these capabilities. Thisdivisionof measurementlabor would makepsychometricsense, not only because the essay scores mightbe made morereliable, but also because they wouldprOVide a unique kind of information. The Brelandand Jones study (1982) underscores the feasibility of this approach. Takentogether, ratings on the nine discourse, or higher-order, characteristics had exactly the same correlationwiththe ECT holisticscores as did the PWS holisticscores. Thisresult "suggests that quick judgments of the ninediscourse characteristics are comparable to a holisticjudgment" (p. 13). In fact, multipleregression analysis showed only five of the nine discourse characteristics to be significant contributors to the predictionof ECT holisticscores. In order of importance, these wereoverallorganization, noteworthy ideas, sentence logic, supporting material,and paragraphing and transition ( p. 14). The reliability of the directmeasurecan also be enhanced by combining the essay score and the objectivescore in one total score for reporting. SeveraCollegBoard tests employ this approach. And"assessments like the ECT combine directand indirect assessment information because the directassessment information is not considered reliable enough for reporting. Asingle ECT score is reported by weightingthe directand indirect information in accordance withthe amount of testing time associated wit each. Such a weightingmaynot be the mostappropriate," Brelandand Jones (1982) maintain. "Another approach to combining scores mightbe to weightscores by relative reliabilities" (or, perhaps, validities). They add that "reliable diagnostic subscores could possibly be generated by combining part scores from indirect assessments withanalytic scores from directassessments" (p. 28).