Reliability, validity, and fairness of classroom assessments

In Chapter 5 of A Tool Kit for Professional Developers: Alternative Assessment, the authors suggest guidelines for evaluating the quality of assessment instruments. Good assessment requires minimizing factors that could lead to misinterpretation of results. Three criteria for meeting this requirement are reliability, validity, and fairness.

Reliability is defined as "an indication of the consistency of scores across evaluators or over time." An assessment is considered reliable when the same results occur regardless of when the assessment occurs or who does the scoring. There should be compelling evidence to show that results are consistent across raters and across scoring occasions.

Validity is defined as "an indication of how well an assessment actually measures what it is supposed to measure." The chapter identifies three aspects of an assessment that must be evaluated for validity: tasks, extraneous interference, and consequences.

Every assessment requires students to complete some task or activity. A valid task should: reflect actual knowledge or performance, not test-taking skill and memorized algorithms; engage and motivate students to perform to the best of their ability; be consistent with current educational theory and practice; be reviewed by experts to judge content quality and authenticity.

Extraneous interference occurs when there is something in the assessment that might get in the way of students being able to demonstrate what they know and can do. A valid assessment does not require knowledge or skills that are irrelevant to what is actually being assessed. Examples of extraneous factors might include: ability to read, write, role-play, or understand the context; personality; physical limitations; or knowledge of irrelevant background information.

Valid assessments minimize unintended negative consequences. Negative effects of assessments might include restricting curricula to what can be easily assessed, communicating unintended messages about power, control, or social status, and fostering narrow images of the nature of a particular discipline.

Fairness means that an assessment should "allow for students of both genders and all backgrounds to do equally well. All students should have equal opportunity to demonstrate the skills and knowledge being assessed." The fairness of the assessment is jeopardized if bias exists either in the task or in the rater.

Bias in a task is similar to the idea of extraneous interference; bias, however, refers to things that systematically affect entire groups of students rather than individual students. Consider the example provided in Chapter 5:

"If a task is set in the context of football and students who have a knowledge of football have an advantage on the task, that knowledge is an extraneous factor. The context becomes a biasing factor if particular groups of students know less about football than other groups of students. For example, in this society few girls have experience playing football. If boys, in general, have experience with the game and more knowledge of its structure and rules, then the task could be biased in favor of boys."

For a task to be fair, its content, context, and performance expectations should: reflect knowledge, values, and experiences that are equally familiar and appropriate to all students; tap knowledge and skills that all students have had adequate time to acquire; be as free as possible of cultural, ethnic, and gender stereotypes.

Attitudes, beliefs, and values - often held unconsciously - may be reflected in the judgments of raters. For example, a rater may let preconceptions about the abilities of boys and girls influence scoring decisions. When evaluating a rater for fairness, consider whether: some feature of students' performance (e.g., handwriting, poor spelling, pet peeves) might influence how another, supposedly independent feature is judged; knowledge of a students' gender, ethnic heritage, curriculum track, etc. influences judgments; knowledge of an individual student (e.g., poor performance in the past) affects performance.

A Tool Kit for Professional Developers: Alternative Assessment is a product of the Laboratory Network Project. The tool kit contains numerous examples of alternative assessments and provides professional development activities that can assist teachers in considering the significance of these concepts for their classrooms.

Copyright © North Central Regional Educational Laboratory. All rights reserved.
Disclaimer and copyright information.