Select or Design Assessments That Elicit Established Outcomes

Herman, Aschbacher, and Winters (1992) suggest ten steps as part of the assessment design process.

  1. Clearly state the purpose for the assessment, and do not expect the assessment to meet purposes for which it was not designed.

  2. Clearly define what it is you want to assess (the achievement target).

  3. Match the assessment method to the achievement purpose and target defined in step 2.

  4. Specify illustrative tasks that require students to demonstrate certain skills and accomplishments. Avoid tasks that may be merely interesting activities for students, but may not yield evidence of a student's mastery of the desired outcomes.

  5. Specify the criteria and standards for judging student performance on the tasks selected in step 4. Be as specific as possible, and provide samples of student work that exemplify each of the standards.

  6. Develop a reliable rating process that allows different raters at different points in time to obtain the same - or nearly the same - results, or allows a single teacher in a classroom to assess each student using the same criteria.

    Figure 1
           Is Rater in Perfect           Is Rater in Agreement
           Agreement with the            with the Criterion Score,
           Criterion Score?              Plus or Minus 1 Point?

    Rater Paper #1 Paper #2 Rater's Paper #1 Paper #2 Rater's Average Average Agreement Agreement Linda yes no 50% yes no 50% Donna no no 0% yes yes 100% Mark yes yes 100% yes yes 100% TOTAL 67% = 33% = 50% 100% = 67% = 83% yes yes yes yes

    Figure 1 illustrates a case in which three raters are asked to rate two criterion (exemplary) papers after some training. According to the results, Linda agrees with the score for paper 1 but not for paper 2; in fact, for paper 2 she is not even within 1 point. Donna is not in perfect agreement with the criterion scores on either paper 1 or 2, but is in agreement plus or minus one point on both papers. Mark is in agreement all the time and is ready to rate student work. Linda and Donna probably need more training. If you were to report these rater agreement results, you would say, "On average, raters obtained perfect agreement with criterion scores 50 percent of the time, and reached plus or minus one agreement 83 percent of the time." (Herman, Aschbacher, & Winters, 1992)

  7. Avoid the pitfalls that threaten reliability and validity and can lead to mismeasurement of students. Assessors should ensure (a) adequate sampling of the content domain, (b) absence of bias or subjective scoring, (c) reasonable uniformity in administering assessments, (d) minimal effects of extraneous factors (e.g., too much reading on a mathematics or social studies test), (e) a suitable environment for assessment, and (f) awareness of and compensation for temporary factors affecting the student (e.g., parents' recent divorce or illness).

  8. Collect evidence/data showing that the assessment is reliable (yields consistent results) and valid (yields useful data for the decisions being made). With performance assessments, reliability and validity might be demonstrated through inter-rater agreement on scoring and evidence that students who perform well on the assessment also perform well on related items or tasks. With multiple-choice assessments, correlations should demonstrate internal consistency (students perform equally well or poorly on all related items) and show that performance on the test correlates to performance of similar skills presented differently. In the classroom, where teachers have multiple measurements of each student's performance, the formal collection of technical quality data is not necessary.

  9. Ensure "consequential validity." That is, the assessment should have a maximum of positive effects and a minimum of negative ones. For example, the assessment should give teachers and students the right messages about what is important to learn and to teach, it should not restrict the curriculum, it should be a useful instructional tool, and decisions made on the basis of the assessment results should be appropriate.

  10. Use test results to refine assessment and improve curriculum and instruction; provide feedback to students, parents, and the community.

Copyright © North Central Regional Educational Laboratory. All rights reserved.
Disclaimer and copyright information.