R.J. Dietel, J.L. Herman, and R.A. Knuth
NCREL, Oak Brook, 1991
The reasons why we assess vary considerably across many groups of people within the educational community.
|Who Needs To Assess?||Purposes of Assessment|
|Policymakers||Policymakers use assessment to:
* Set standards
* Focus on goals
* Monitor the quality of education
* Reward/sanction various practices
* Formulate policies
* Direct resources including personnel and money
* Determine effects of tests
|Administrators and school||Monitor program effectiveness planners use assessment to:
* Identify program strengths and weaknesses
* Designate program priorities
* Assess alternatives
* Plan and improve programs
|Teachers and administrators|| Make grouping decisions use assessment to:
* Perform individual diagnosis and prescription
* Monitor student progress
* Carry out curriculum evaluation and refinement
* Provide mastery/promotion/grading and other feedback
* Motivate students
* Determine grades
|Parents and students use||Gauge student progress assessment to:
* Assess student strengths and weaknesses
* Determine school accountability
* Make informed educational and career decisions
Billions of dollars are spent each year on education, yet there is widespread dissatisfaction with our educational system among educators, parents, policymakers, and the business community. Efforts to reform and restructure schools have focused attention on the role of assessment in school improvement. After years of increases in the quantity of formalized testing and the consequences of poor test scores, many educators have begun to strongly criticize the measures used to monitor student performance and evaluate programs. They claim that traditional measures fail to assess significant learning outcomes and thereby undermine curriculum, instruction, and policy decisions.
The higher the stakes, the greater the pressure that is placed on teachers and administrators to devote more and more time to prepare students to do well on the tests. As a consequence, narrowly focused tests that emphasize recall have led to a similar narrowing of the curriculum and emphasis on rote memorization of facts with little opportunity to practice higher-order thinking skills. The timed nature of the tests and their format of one right answer has led teachers to give students practice in responding to artificially short texts and selecting the best answer rather than inventing their own questions or answers. When teachers teach to traditional tests by providing daily skill instruction in formats that closely resemble tests, their instructional practices are both ineffective and potentially detrimental due to their reliance on outmoded theories of learning and instruction.
Good assessment information provides accurate estimates of student performance and enables teachers or other decisionmakers to make appropriate decisions. The concept of test validity captures these essential characteristics and the extent that an assessment actually measures what it is intended to measure, and permits appropriate generalizations about students' skills and abilities. For example, a ten-item addition/subtraction test might be administered to a student who answers nine items correctly. If the test is valid, we can safely generalize that the student will likely do as well on similar items not included on the test. The results of a good test or assessment, in short, represent something beyond how students perform on a certain task or a particular set of items; they represent how a student performs on the objective which those items were intended to assess.
Measurement experts agree that test validity is tied to the purposes for which an assessment is used. Thus, a test might be valid for one purpose but inappropriate for other purposes. For example, our mathematics test might be appropriate for assessing students' mastery of addition and subtraction facts but inappropriate for identifying students who are gifted in mathematics. Evidence of validity needs to be gathered for each purpose for which an assessment is used.
A second important characteristic of good assessment information is its consistency, or reliability. Will the assessment results for this person or class be similar if they are gathered at some other time or under different circumstances or if they are scored by different raters? For example, if you ask someone what his/her age is on three separate occasions and in three different locations and the answer is the same each time, then that information is considered reliable. In the context of performance-based and open-ended assessment, inter-rater reliability also is essential; it requires that independent raters give the same scores to a given student response.
Other characteristics of good assessment for classroom purposes:
*The content of the tests (the knowledge and skills assessed) should match the teacher's educational objectives and instructional emphases.
*The test items should represent the full range of knowledge and skills that are the primary targets of instruction.
*Expectations for student performance should be clear.
*The assessment should be free of extraneous factors which unnecessarily confuse or inadvertently cue student responses. (For example, unclear directions and contorted questions may confuse a student and confound his/her ability to demonstrate the skills which are intended for assessment. A math item that requires reading skill will inhibit the performance of students who lack adequate skills for comprehension.)
Researchers at the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) are developing an expanded set of validity criteria for performance-based, large-scale assessments. Assessment researchers Bob Linn, Eva Baker, and Steve Dunbar have identified eight criteria that performance-based assessments should meet in order to be considered valid.
Criteria for Valid Performance-Based Assessments
|Consequences||Does using an assessment lead to intended consequences or does it produce unintended consequences, such as teaching to the test? For example, minimum competency testing was intended to improve instruction and the quality of learning for students; however, its actual effects too often were otherwise (a shallow drill and kill curriculum for remedial students).|
|Fairness||Does the assessment enable students from all cultural backgrounds to demonstrate their skills, or does it unfairly disadvantage some students?|
|Transfer||Do the results of the assessment generalize to other generalizability problems and other situations? Do they adequately represent students' performance in a given domain?|
|Cognitive complexity||Do the assessments adequately assess higher levels of understanding and complex thinking? We cannot assume that performance-based assessments will test a higher level of student understanding because they appear to do so. Such assumptions require empirical evidence.|
|Content quality||Are the tasks selected to measure a given content area worth the time and effort of students and raters?|
|Content coverage||Do the assessments enable adequate content coverage?|
|Meaningfulness||Are the assessment tasks meaningful to students and do they motivate them to perform their best?|
|Cost and efficiency||Has attention been given to the efficiency of the data collection designs and scoring procedures? (Performance-based assessments are by nature labor-intensive.)|
Multiple-choice measures have provided a reliable and easy-to-score means of assessing student outcomes. In addition, considerable test theory and statistical techniques were developed to support their development and use. Although there is now great excitement about performance-based assessment, we still know relatively little about methods for designing and validating such assessments. CRESST is one of many organizations and schools researching the promises and realities of such assessments.
Analysis of Traditional Views
Methods of assessment are determined by our beliefs about learning. According to early theories of learning, complex higher-order skills had to be acquired bit-by-bit by breaking learning down into a series of prerequisite skill, a building-blocks-of-knowledge approach. It was assumed incorrectly that after basic skills had been learned by rote, they could be assembled into complex understandings and insight. However, evidence from contemporary cognitive psychology indicates that all learning requires that the learner think and actively construct evolving mental models.
From today's cognitive perspective, meaningful learning is reflective, constructive, and self-regulated. People are seen not as mere recorders of factual information but as creators of their own unique knowledge structures. To know something is not just to have received information but to have interpreted it and related it to other knowledge one already has. In addition, we now recognize the importance of knowing not just how to perform, but also when to perform and how to adapt that performance to new situations. Thus, the presence or absence of discrete bits of information-which is typically the focus of traditional multiple-choice tests-is not of primary importance in the assessment of meaningful learning. Rather, what is important is how and whether students organize, structure, and use that information in context to solve complex problems.
Contrary to past views of learning, cognitive psychology suggests that learning is not linear, but that it proceeds in many directions at once and at an uneven pace. Conceptual learning is not something to be delayed until a particular age or until all the basic facts have been mastered. People of all ages and ability levels constantly use and refine concepts. Furthermore, there is tremendous variety in the modes and speed with which people acquire knowledge, in the attention and memory capabilities they can apply to knowledge acquisition and performance, and in the ways in which they can demonstrate the personal meaning they have created.
Current evidence about the nature of learning makes it apparent that instruction which strongly emphasizes structured drill and practice on discrete, factual knowledge does students a major disservice. Learning isolated facts and skills is more difficult without meaningful ways to organize the information and make it easy to remember. Also, applying those skills later to solve real-world problems becomes a separate and more difficult task. Because some students have had such trouble mastering decontextualized "basics," they are rarely given the opportunity to use and develop higher-order thinking skills.
Recent studies of the integration of learning and motivation also have highlighted the importance of affective and metacognitive skills in learning. For example, recent research suggests that poor thinkers and problem solvers differ from good ones not so much in the particular skills they possess as in their failure to use them in certain tasks. Acquisition of knowledge skills is not sufficient to make one into a competent thinker or problem solver. People also need to acquire the disposition to use the skills and strategies, as well as the knowledge of when and how to apply them. These are appropriate targets of assessment.
The role of the social context of learning in shaping higher-order cognitive abilities and dispositions has also received attention over the past several years. It has been noted that real-life problems often require people to work together as a group in problem-solving situations, yet most traditional instruction and assessment have involved independent rather than small group work. Now, however, it is postulated that groups facilitate learning in several ways: modeling effective thinking strategies, scaffolding complicated performances, providing mutual constructive feedback, and valuing the elements of critical thought. Group assessments, thus, can be important.
Since the influence of testing on curriculum and instruction is now widely acknowledged, educators, policymakers, and others are turning to alternative assessment methods as a tool for educational reform. The movement away from traditional, multiple-choice tests to alternative assessments-variously called authentic assessment or performance assessment-has included a wide variety of strategies such as open-ended questions, exhibits, demonstrations, hands-on execution of experiments, computer simulations, writing in many disciplines, and portfolios of student work over time. These terms and assessment strategies have led the quest for more meaningful assessments which better capture the significant outcomes we want students to achieve and better match the kinds of tasks which they will need to accomplish in order to assure their future success.
Trends Stemming from the Behavioral to Cognitive Shift
|Emphasis of Assessment||Behavorial Views||Cognitive Views|
|View of learner||Passive, responding to||Active, constructing knowledge environment|
|Scope of assessment||Discrete, isolated skills||Integrated and cross-disciplinary|
|Beliefs about knowing||Accumulation of isolated||Application and use of and being skilled facts and skills knowledge|
|Emphasis of instruction||Delivering maximally||Attention to metacognition, and assessment effective materials motivation, self-determination|
|Characteristics of assessment||Paper-pencil, objective||Authentic assessments on multiple-choice, contextualized problems that are short answer relevant and meaningful, emphasize higher-level thinking, do not have a single correct answer, have public standards known in advance, and are not speeded|
|Frequency of assessment||Single occasion||Samples over time (portfolios) which provide basis for assessment by teacher, students, and parents|
|Who is assessed||Individual assessment||Assessment of group process skills on collaborative tasks which focus on distributions over averages|
|Use of technology for||Machine-scored bubble||High-tech applications such as administration and scoring sheets computer-adaptive testing, expert systems, and simulated environments|
|What is assessed||Single attribute of||Multidimensional assessment that learner recognizes the variety of human abilities and talents, malleability of student ability, and that IQ is not fixed|
*Students are involved in setting goals and criteria for assessment.
*Students perform, create, produce, or do something.
*Tasks require students to use higher-level thinking and/or problem solving skills.
*Tasks often provide measures of metacognitive skills and attitudes, collaborative skills and intrapersonal skills as well as the more usual intellectual products.
*Assessment tasks measure meaningful instructional activities.
*Tasks often are contextualized in real-world applications.
*Student responses are scored according to specified criteria, known in advance, which define standards for good performance.
While assessment has the potential to improve learning for all students, historically it has acted as a barrier rather than a bridge to educational opportunity. Assessments have been used to label students and put them in dead end tracks. Traditional tests have been soundly criticized as biased and unfair to minority students. And, the assessment of language minority students has been particularly problematic.
A key point regarding equity as applied to performance-based assessment is made by Yale Professor Emeritus Edmund Gordon. "We begin with the conviction that it is desirable that attention be given to questions of equity early in the development of an assessment process rather than as an add-on near the end of such work....The task then is to find assessment probes (test items) which measure the same criterion from contexts and perspectives which reflect the life space and values of the learner."
Robert Linn says, "The criterion of equity needs to be applied to any assessment. It is a mistake to assume that shifting from standardized tests to performance-based assessments will eliminate concerns about biases against racial/ethnic minorities or that such a shift will necessarily lead to equality of performance.
"Although many at-risk students come to school deficient in prior knowledge that is important to school achievement, teachers and schools can make a substantial difference through the construction of assessments that take into account the vast diversity of today's student populations. Gaps in performance among groups exist because of differences in familiarity, exposure, and motivation of the subjects being assessed. Substantial changes in instructional strategy and resource allocation are required to give students adequate preparation for complex, time-consuming, open-ended assessments. Providing training and support for teachers to move in these directions is essential.
"Questions of fairness arise not only in the selection of performance tasks but in the scoring of responses. As Stiggins has stated, it is critical that the scoring procedures are designed to assure that `performance ratings reflect the examinee's true capabilities and are not a function of the perceptions and biases of the persons evaluating the performance.' The same could be said regarding the perceptions and biases of the persons creating the test. The training and calibrating of raters is critical in this regard."
What we know about performance-based assessment is limited and there are many issues yet to be resolved. We do know that approaches which encourage new assessment methods need the broad-based support of the community and school administration. Like any change in schools, changes in assessment practices will require:
* Strong leadership support
* Staff development and training
* Teacher ownership
* Continuing follow-up and support for change through coaching and mentoring
* Environments that support experimentation and risk-taking
As schools move toward more performance-based assessment, they also will need to come to some resolution on a number of issues, among them that performance-based assessments:
*Require more time to develop
*May limit content coverage
*Require a shift in teaching practices
*Require substantial time for administration
*Lack a network of colleagues for sharing and developing
*Require new methods of aggregating and reporting data
*Require new viewpoints about how to use for comparative purposes
The examples of excellence in this program clearly show that in successful
schools, teaching is a multidimensional activity. One of the most powerful
of these dimensions is that of "teacher as researcher." Not only do teachers
need to use research in their practice, they need to participate in "action"
research in which they are always engaging in investigation and striving for improved learning. The key to action research is to pose a question or goal, and then design actions and evaluate progress in a systematic, cyclical fashion as the means are carried out. Below are four major ways that you can become involved as an action researcher.
1. Use the checklist found at the end of this section to evaluate your school and
2. Implement the models of excellence presented in this program. Ask yourself:
*What outcomes do the teachers in this program accomplish that I want my students to achieve?
*How can I find out more about their classrooms and schools?
*Which ideas can I most easily implement in my classroom and school?
*What will I need from my school and community?
*How can I evaluate progress?
3. Form a team and initiate a research project. A research project can be
designed to generate working solutions to a problem. The issues for your
research group to address are:
*What is the problem or question we wish to solve?
*What will be our approach?
*How will we assess the effectiveness of our approach?
*What is the time frame for working on this project?
*What resources do we have available?
*What outcomes do we expect to achieve?
4. Investigate community needs and integrate solutions within your class
activities. Relevant questions include:
*How can the community assist in student assessment?
*What is the community's vision of learning that defines what should be assessed?
*How will the community benefit from improved assessment techniques?
*What kind of relationships can my class forge with the community?
5. Establish support groups consisting of school personnel and community
members. The goals of these groups are to:
*Share teaching and learning experiences both in and out of school
*Discuss research and theory related to learning
*Act as mentors and coaches for one another
*Connect goals of the community with goals of the school
The following are activities that groups such as your PTA, church, and local Chamber of Commerce can do together with your schools.
1. Visit your school informally for discussions using the checklist on page 23.
2. Consider ways that schools and community members can work together to
*Materials for a rich learning environment (e.g., real literature in print and audio form, computers) in which authentic assessment can occur
*Opportunities for students and teachers to learn out of school
*Opportunities for students to access adults as role models, tutors, aides, and experts as they engage in self assessment
*Opportunities for students to provide community services such as surveys, newsletters, plays, and tutoring
*Opportunities for students to participate in community affairs
*Opportunities for administrators, teachers, or students to visit managers and company executives to learn how assessment occurs in the `real' world
3. Promote school and community forums to debate the national education goals:
*Invite your local television and radio stations to host school and community forums.
*Have "revolving school/community breakfasts" (community members visit schools for breakfast once or twice a month, changing the staff and community members each time).
*Gather information on the national education goals and their assessment.
*Gather information on alternative models of schooling.
*Gather information on best practices and research in the classroom.
* Examine the vision of performance assessment set forth by CRESST.
*Invite teachers who are involved in performance-based assessment in your area to discuss the changes being recommended by CRESST.
Some of the important questions and issues to discuss in your forums are:
*Have we reviewed the national education goals documents to arrive at a common understanding of each goal?
*What will students be like who learn in schools that achieve the goals?
*What must schools be like to achieve the goals?
*Do we agree with the goals, and how high do we rate each?
*What is the reason for national pessimism about their achievement?
*How are our schools doing now in terms of achieving each?
*Why is it important for us to achieve the goals?
* What are the consequences for our community if we don't achieve them?
*What assumptions are we making about the future in terms of Knowledge, Technology and Science, Humanities, Family, Change, Population, Minority Groups, Ecology, Jobs, Global Society, and Social Responsibility? Discuss in terms of each of the goal areas.
4. Consider ways to use this program guidebook, Alternatives for Measuring
Performance, to promote understanding and commitment from school staff,
parents, and community members.
The items below are based on best practices of the teachers and researchers in
Program 4. The checklist can be used to look at current practices in your
school and to jointly set new goals with parents and community groups.
Vision of Learning
*Meaningful learning experiences for students and school staff
*Students encouraged to make decisions about their learning and to assess their own performance
*Restructuring to promote learning in and out of school
*High expectations for learning for all students
*A community of problem solvers in the classroom and in the school
*Teachers and administrators committed to achieving the national education goals
Curriculum and Instruction
*Identification of core concepts
*Curriculum that calls for a comprehensive repertoire of learning and assessment strategies
*Collaborative teaching and learning involving student-generated questioning and sustained dialogue among students and between students and teachers
*Teachers assessing to build new information on student strengths
*Authentic tasks in the classroom such as solving everyday problems, collecting and analyzing data, investigating patterns, and keeping journals
*Opportunities for students to engage in learning and assessment out of school with community members
*Homework that is challenging enough to be interesting but not so difficult as to cause failure
*Assessments that respect multiple cultures and perspectives
*A rich learning environment with places for children to engage in sustained problem solving and self assessment
*Instruction that enables children to develop an understanding of the purposes and methods of assessment
*Opportunities for children to decide performance criteria and method
Assessment and Grouping
*Assessment that informs and is integral to instruction
*Assessment sessions that involve the teacher, student, and parents
*Performance-based assessment such as portfolios that include drafts, journals, projects, and problem-solving logs
*Multiple opportunities to be involved in heterogeneous groupings, especially for students at risk
*Public displays of student work and rewards
*Methods of assessing group performance
*Group assessments of teacher, class, and school
*Opportunities for teachers to attend conferences and meetings on assessment
*Teachers as researchers, working on research projects
*Teacher or school partnerships/projects with colleges and universities
*Opportunities for teachers to observe and coach other teachers
*Opportunities for teachers to try new practices in a risk-free environment
Involvement of the Community
*Community members' and parents' participation in assessing performance as experts, aides, guides, or tutors
*Active involvement of community members on task forces for curriculum, staff development, assessment, and other areas vital to learning
*Opportunities for teachers and other school staff to visit informally with community members to discuss the life of the school, resources, and greater involvement of the community
Policies for Students at Risk
*Students at risk integrated into the social and academic life of the school
*Policies/practices to display respect for multiple cultures and role models
*A culture of fair assessment practices
Northwest Regional Educational Laboratory Videotapes on Assessment (ten videotapes available separately or as a set of ten). Contact IOX Assessment Associates, 5420 McConnell Ave., Los Angeles, CA 90066, (213) 822-3275.
Airasian, P. W. (1991). Classroom Assessment. New York: McGraw-Hill. Easy-to-read book on basic assessment theory and practice. Includes information on performance-based assessments that will be of special interest to teachers.
Perrone, V. (Ed.). (1991). Expanding Student Assessment. Alexandria, VA: Association for Supervision and Curriculum Development. Major emphasis on better ways to assess student learning.
Wittrock, M. C., & Baker, E. L. (Eds.). (1991). Testing and Cognition. Englewood Cliffs, NJ: Prentice Hall. Series of articles from distinguished assessment researchers focusing on current assessment issues and the learning process.
The following reports on assessment are available from the National Center for Research on Evaluation, Standards, and Student Testing (CRESST). Contact Kim Hurst (310) 206-1512 to order reports. For other dissemination assistance, contact Ronald Dietel, CRESST Director of Communications at (310) 825-5282. Or write to: CRESST/UCLA Graduate School of Education, 405 Hilgard Avenue, Los Angeles, CA 90024-1522.
Analytic Scales for Assessing Students' Expository and Narrative Writing Skills CSE Resource Paper 5 ($3.00). These rating scales have been developed to meet the need for sound, instructionally relevant methods for assessing students' writing competence. Knowledge of students' performance utilizing these scales can provide valuable information for assessing achievement and facilitating instructional planning at the classroom, school, and district levels.
Effects of Standardized Testing on Teachers and Learning - Another Look
CSE Technical Report 334, 1991 ($5.50). This study asks "what are the effects of standardized testing on schools, and the teaching and learning processes within them? Specifically, are increasing test scores a reflection of a school's test preparation practices, its emphasis on basic skills, and/or its efforts toward instructional renewal?" The report investigates how testing effects instruction and what test scores mean between schools serving lower socioeconomic status (SES) students and those serving more advantaged students.
Complex, Performance-Based Assessments: Expectations and Validation Criteria
CSE Technical Report 331, 1991 ($3.00). This report suggests that just because performance-based measures are derived from actual student performance, it is wrong to assume that such measures are any more indicative of student achievement than are multiple-choice tests. Therefore performance-based measures need to meet validity criteria. The authors recommend eight important criteria by which to judge the validity of new students assessments, including open-ended problem-solving, portfolio assessments, and computer simulations.
Guidelines For Effective Score Reporting
CSE Technical Report 326, 1991 ($3.00). According to this report: "Despite growing national interest in testing and assessment, many states present their data in such a dry, uninteresting way that they fail to generate any attention." The authors examined current practices and recommendations of state reporting of assessment results, answering the question "how can we be sure that the vast collection of test information is properly collected, analyzed, and implemented?" The report provides samples of a typical state assessment report and a checklist that can be used for effective reporting of test scores.
State Activity and Interest Concerning Performance-Based Assessments
CSE Technical Report 322, 1990 ($2.50). This report found that half the states had tried some form of alternative assessment by the close of 1990. Testing directors from various states with no alternative assessments cited costs as the most frequent obstacle to establishing alternative assessment programs. This report suggests that getting around the financial problem may involve spreading the cost over other budgets, such as staff development and curriculum development; testing only a sample of students; sharing costs with local education agencies; or rotating the subjects tested across years.
Achievement test An examination that measures educationally relevant skills or knowledge about such subjects as reading, spelling, or mathematics.
Age norms Values representing typical or average performance of people of certain age groups.
Authentic task A task performed by students that has a high degree of similarity to tasks performed in the real world.
Average A statistic that indicates the central tendency or most typical score of a group of scores. Most often average refers to the sum of a set of scores divided by the number of scores in the set.
Battery A group of carefully selected tests that are administered to a given population, the results of which are of value individually, in combination, and totally.
Ceiling The upper limit of ability that can be measured by a particular test.
Criterion-referenced test A measurement of achievement of specific criteria or skills in terms of absolute levels of mastery. The focus is on performance of an individual as measured against a standard or criteria rather than against performance of others who take the same test, as with norm-referenced tests.
Diagnostic test An intensive, in-depth evaluation process with a relatively detailed and narrow coverage of a specific area. The purpose of this test is to determine the specific learning needs of individual students and to be able to meet those needs through regular or remedial classroom instruction.
Dimensions, traits, or subscales The sub-categories used in evaluating a performance or portfolio product (e.g., in evaluating students writing one might rate student performance on subscales such as organization, quality of content, mechanics, style).
Domain-referenced test A test in which performance is measured against a well-defined set of tasks or body of knowledge (domain). Domain-referenced tests are a specific set of criterion-referenced tests and have a similar purpose.
Grade equivalent The estimated grade level that corresponds to a given score.
Holistic scoring Scoring based upon an overall impression (as opposed to traditional test scoring which counts up specific errors and subtracts points on the basis of them). In holistic scoring the rater matches his or her overall impression to the point scale to see how the portfolio product or performance should be scored. Raters usually are directed to pay attention to particular aspects of a performance in assigning the overall score.
Informal test A non-standardized test that is designed to give an approximate index of an individual's level of ability or learning style; often teacher-constructed.
Inventory A catalog or list for assessing the absence or presence of certain attitudes, interests, behaviors, or other items regarded as relevant to a given purpose.
Item An individual question or exercise in a test or evaluative instrument.
Norm Performance standard that is established by a reference group and that describes average or typical performance. Usually norms are determined by testing a representative group and then calculating the group's test performance.
Normal curve equivalent Standard scores with a mean of 50 and a standard deviation of approximately 21.
Norm-referenced test An objective test that is standardized on a group of individuals whose performance is evaluated in relation to the performance of others; contrasted with criterion-referenced test.
Objective percent correct The percent of the items measuring a single objective that a student answers correctly.
Percentile The percent of people in the norming sample whose scores were below a given score.
Percent score The percent of items that are answered correctly.
Performance assessment An evaluation in which students are asked to engage in a complex task, often involving the creation of a product. Student performance is rated based on the process the student engages in and/or based on the product of his/her task. Many performance assessments emulate actual workplace activities or real-life skill applications that require higher order processing skills. Performance assessments can be individual or group-oriented.
Performance criteria A predetermined list of observable standards used to rate performance assessments. Effective performance criteria include considerations for validity and reliability.
Performance standards The levels of achievement pupils must reach to receive particular grades in a criterion-referenced grading system (e.g., higher than 90 receives an A, between 80 and 89 receives a B, etc.) or to be certified at particular levels of proficiency.
Portfolio A collection of representative student work over a period of time. A portfolio often documents a student's best work, and may include a variety of other kinds of process information (e.g., drafts of student work, student's self assessment of their work, parents' assessments). Portfolios may be used for evaluation of a student's abilities and improvement.
Process The intermediate steps a student takes in reaching the final performance or end-product specified by the prompt. Process includes all strategies, decisions, rough drafts, and rehearsels-whether deliberate or not-used in completing the given task.
Prompt An assignment or directions asking the student to undertake a task or series of tasks. A prompt presents the context of the situation, the problem or problems to be solved, and criteria or standards by which students will be evaluated.
Published test A test that is publicly available because it has been copyrighted and published commercially.
Rating scales A written list of performance criteria associated with a particular activity or product which an observer or rater uses to assess the pupil's performance on each criterion in terms of its quality.
Raw score The number of items that are answered correctly.
Reliability The extent to which a test is dependable, stable, and consistent when administered to the same individuals on different occasions. Technically, this is a statistical term that defines the extent to which errors of measurement are absent from a measurement instrument.
Rubric A set of guidelines for giving scores. A typical rubric states all the dimensions being assessed, contains a scale, and helps the rater place the given work properly on the scale.
Screening A fast, efficient measurement for a large population to identify individuals who may deviate in a specified area, such as the incidence of maladjustment or readiness for academic work.
Specimen set A sample set of testing materials that is available from a commercial test publisher. The sample may include a complete individual test without multiple copies or a copy of the basic test and administration procedures.
Standardized test A form of measurement that has been normed against a specific population. Standardization is obtained by administering the test to a given population and then calculating means, standard deviations, standardized scores, and percentiles. Equivalent scores are then produced for comparisons of an individual score to the norm group's performance.
Standard scores A score that is expressed as a deviation from a population mean.
Stanine One of the steps in a nine-point scale of standard scores.
Task A goal-directed assessment activity, demanding that the student use their background of knowledge and skill in a continuous way to solve a complex problem or question.
Validity The extent to which a test measures what it was intended to measure. Validity indicates the degree of accuracy of either predictions or inferences based upon a test score.
Airaisian, P.W. (1991). Classroom assessment. New York: McGraw-Hill.
Baker, E. L. (1991, September). Alternative assessment and national education policy. Paper presented at the symposium on Limited English Proficient Students, Washington, D.C.
Charles, R., Lester, F., & O'Daffer, P. (1987). How to evaluate progress in problem solving. Reston, VA: National Council of Teachers of Mathematics.
Daves, C.W. (Ed.). (1984). The uses and misuses of tests. San Fransisco: Jossey-Bass.
Diamond, E. E., & Tuttle, C. K. (1985). Sex equity in testing. In S. S. Klein (Ed.), Handbook for achieving sex equity through education. Baltimore, MD: Johns Hopkins University Press.
Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice-Hall, Inc.
Flavell, J. H. (1985). Cognitive development. Englewood Cliffs, NJ: Prentice-Hall, Inc.
Gardner, H. (1987, May). Developing the spectrum of human intelligence. Harvard Educational Review, 76-82.
Gould, S. J. (1981). The mismeasure of man. New York: Norton and Co.
Illinois State Board of Education. (1988). An assessment handbook for Illinois schools. Springfield, IL: Author.
Lindheim, E., Morris, L.L., & Fitz-Gibbon, C.T. (1987). How to measure performance and use tests. Newbury Park, CA: Sage Publications.
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. (SCE Report 331). Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
Nenerson, M. E., Morris, L. L., & Fitz-Gibbon, C. T. (1987). How to measure attitudes. Newbury Park, CA: Sage Publications.
Paris, S. G., Lawton, T. A., Turner, J., & Roth, J. (1991). A developmental perspective on standardized achievement testing. Educational Researcher, 20 (5).
Piaget, J. (1952). The origins of intelligence in children. New York: Norton and Co.
Popham, W.J. (1981). Modern educational measurement. Englewood Cliffs, NJ: Prentice-Hall, Inc.
Myers, M. (1985). The teacher researcher: How to study writing in the classroom. Urbana, Illinois: National Council of Teachers of English.
Smith, M. L. (1991). Put to the test: The effects of external testing on teachers. Educational Researcher, 20 (5).
Stiggins, R. J. (1991, March). Assessment literacy. Phi Delta Kappan. 72 (7).
Tierney, R.J., Carter, M.A., & Desai, L.E. (1991). Portfolio assessment in the reading-writing classroom. Norwood, MA: Christopher-Gordon Publishers.
Wiggins, G. (May 1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan, 703-713.
Wise, S. L., & Plake, B. S., & Mitchell, J. V. (Eds.). (1988). Applied measurement in education. Hillsdale, NJ: Erlbaum.
Wolf, D., Bixby, J., Glenn, J. G., & Gardner, H. (1991). To use their minds well: Investigating new forms of student assessment. In G. Grant (Ed.), Review of research in education (No. 17). Washington, D.C.: American Educational Research Association.