Skip over navigation
Visit the NCREL Home Page

Critical Issue:
Multiple Dimensions of Assessment That Support
Student Progress in Science and Mathematics

This Critical Issue was written by Asta Svedkauskaite, program associate at Learning Point Associates, and researched by Mary McNabb, Ed.D., director of Learning Gauge Inc.

Editorial guidance was provided by Gilbert Valdez, Ph.D., director of the North Central Regional Technology in Education Consortium and codirector of the North Central Eisenhower Mathematics and Science Consortium (NCEMSC); and Barbara Youngren, NCEMSC director.

The Critical Issue team would like to acknowledge the following experts for reviewing this article: Arlene Hambrick , Ph.D., National-Louis University faculty member and a private consultant in the field of equity in mathematics and science education; Terese A. Herrera, Ph.D., mathematics resource specialist at Eisenhower National Clearinghouse; and Nijole R. Mackevicius, External Resources coordinator at Chicago Public Schools' Office of Mathematics and Science.

Download an Adobe® Reader® (PDF) version of the this Critical Issue (711KB)

Pathways Home


Reform documents such as the National Science Education Standards (National Research Council, 1996), Benchmarks for Science Literacy (American Association for the Advancement of Science, 1993), and Principles and Standards for School Mathematics (National Council of Teachers of Mathematics, 2000) suggest that the focus of planning and teaching should be on providing all students with optimal opportunities to learn to their maximum potentials. High-stakes testing can easily narrow the scope and depth of the curriculum and thus in-depth learning of students. Tests used for accountability purposes to meet federal and state requirements need to be aligned with curriculum standards that explore science and mathematics in a comprehensive fashion. Furthermore, to continually improve curriculum and instruction, some tests need to provide ongoing (formative) feedback that a high-stakes test once or twice a year cannot provide. Taking into account the continuum between formative assessment to improve learning and summative assessment for accountability, this Critical Issue explores multiple dimensions of assessment from an overall perspective and also with specific references to science and mathematics.

Get Adobe Reader

Adobe Reader FAQ

Overview | Goals | Action Options | Pitfalls | Different Viewpoints | Cases | Contacts | References




Although standards-based reforms have been the focus of school improvement agendas in districts across the country for more than a decade, the No Child Left Behind (NCLB) Act (NCLB, 2002) makes content standards a required part of federal and state accountability systems. NCLB mandates that states have content standards in reading, writing, mathematics, and science, and that states develop or adopt tests to measure students' achievement of those standards. Specifically, it requires states to implement assessments in mathematics and reading in Grades 3–8 and at least once in Grades 10–12 by the 2005–06 school year (U.S. Department of Education, 2002). Beginning in the 2007–08 school year, schools must administer annual tests in science achievement, at least once in Grades 3–5, 6–9, and 10–12 (U.S. Department of Education, 2002). NCLB requires states to make a number of changes to their accountability systems in order to comply. Most notable changes include the following:

Besides state-administered testing, NCLB legislation calls for evaluation of classroom-level opportunities to learn for all students. Such evaluation requires the use of multiple forms of assessment for screening and diagnostic purposes, and it puts new emphasis on uses of formative assessment that can inform instructional decisions. Checking for student understanding almost always takes place in the classroom where teachers can probe for various explanations of students.

Photo of Barbara Campbell As veteran elementary teacher Barbara Campbell explains, all kinds of understanding check-ins are important as they can show whether the student is learning. [Video: 1:03]

New classroom-based as well as large-scale assessments are continually under development to measure students' achievement in alignment with content standards, such as in science and mathematics. Under NCLB, states are required to further define their content standards into measurable grade-level standards for tracking AYP. Classroom-based assessments aligned to a continuum of within-grade progress benchmarks or goals can provide teachers with effective tools for helping students progress toward their expected year-end outcomes, provided those outcomes account for the more complex and enduring concepts, principles, and skills within the science and mathematics curriculum.

It is important to reemphasize that under NCLB legislation, both classroom-based assessments and state-mandated standardized tests are increasing in frequency and importance. Classroom-based assessments provide teachers with feedback about the quality of responses students generate for open-ended, real-world tasks and may be flexible enough to go beyond basic skills testing.

Photo of Gilbert Valdez According to Gilbert Valdez, Ph.D., learning and assessments need to be related to "big ideas" that students encounter in their daily lives beyond the classroom. [Video: :35]

The fear many educators experience is that the increase in standardized testing may narrow the curriculum as a response to policy pressures surrounding students' standardized basic test scores. Herman (1997) reports cumulative findings of researchers who investigated the impact of mandated, public testing and found that standardized testing encourages teachers and administrators to focus instruction on narrow test content as they tend to incorporate the following strategies:

Yet public testing cannot be treated dismissively. Standardized tests are important AYP measures and can become a positive force in education and provide an overall "positive enhancement of learning" (Wright, 2001, p. 60). They can provide an overall health report of K–12 education nationally and globally as well as compare the United States to other countries. As an example, in their Science editorial discussing the Trends in International Mathematics and Science Study (TIMSS) test for 2003, Bybee and Kennedy (2005) summarize: "At Grade 4, between 1995 and 2003, U.S. student scores held constant, although their international ranking declined slightly. But the average scores of U.S. eighth-graders made statistically significant improvements between 1995 and 2003 in both mathematics and science" (p. 481).

Results on such a scale show that standardized tests may be helpful to build a big picture of student achievement. Large-scale testing is helpful in assessing skills and concepts that can be measured by multiple-choice and other standardized format tests because there are skills and concepts that can be tested that way. Also, tests in these formats may be a wise use of time and dollars as well as bring value for instruction. According to the Commission on Instructionally Supportive Assessment (2001), "state-mandated accountability tests must be useful to educators concerned about improving the instruction of children." The Commission presented nine requirements (or steps) for preparing statewide achievement tests that would provide educators with information they need to improve their instruction and benefit students in their own classroom. Applicable to this Critical Issue, there are two requirements that need to be highlighted:

  1. The state must provide educators with optional classroom assessment procedures that can measure students' progress in attaining content standards not assessed by state tests.
  2. A state must ensure that educators receive professional development focused on how to optimize children's learning based on the results of instructionally supportive assessments.

The national push for content standards and their implementation for the last 15 years have certainly had a cumulative effect on the overall educational system, influencing state standards, curriculum materials selection, and assessments –as well as teacher education (Bybee & Kennedy, 2005). The following sections of the Critical Issue provide information on assessment as it further evolves in the era of NCLB and national accountability mandates.

Assessment Reform Trends

Decade after decade, the tendency in this country was to implement standardized testing as a means to improve schools—starting with a district level and going up to an international level (Stiggins, 2004). The mistake, it appears, was to "believe that once-a-year standardized assessments alone can provide sufficient information and motivation to increase student learning" (Stiggins, 2004, p. 22). In fact, literature reports few studies showing student test scores and improvement in learning that is attributable to the presence of high-stakes tests alone (Stiggins, 2004).

In 1998, the National Research Council with the support of the National Science Foundation convened a Committee on the Foundations of Assessment to explore implications from advances in the learning sciences for improving educational assessment practices in schools. The committee focused on assessment of students' achievement rather than on tests that predict students' capacity for future achievement such as college entrance examinations. Starting with the realization that existing assessments are grounded in outdated theories of how people learn and how to measure what students know, the committee established a new foundation for assessment designs and related practices that better support 21st century learning goals.

The committee explains that assessment reform is necessary because of new expectations for 21st century knowledge and skills that arose, in part, from our increasingly competitive economy. The challenge of the 21st century is to ensure every student develops an appreciation for and mastery of science and mathematics subjects. It also is critical they "become the creative, problem-solving, critical-thinking workforce of the 21st century" (Burmaster, 2003). Teachers therefore are faced with an enormous task of testing their students in light of the 21st century requirements for learning, work, citizenship, and life.

Photo of Gilbert Valdez It is critical that assessment is diverse and divergent and develops through many pathways rather than from a single source, says Dr. Valdez. [Video: 1:07]

Assessment reform is a key component of a larger educational reform trend involving advancements in science and mathematics, technology innovations, and new accountability policies. The authentic assessment (or alternative assessment) ] movement responded to the lack of tests addressing higher-order thinking and other vital 21st century skills, as outlined through the work of the Partnership for 21st Century Skills. Herman (1997) stated:

"… There has been great enthusiasm for alternative assessments, which ask students to create their own responses rather than simply selecting them, assessments that many believe best represent the kinds of skills students will need for future success" (p. 2).

The Committee of the Foundations of Assessment recognizes and supports the increasing demand for better student outcomes in effective thinking and reasoning, communication and team-building, complex problem solving, higher order literacy and computation, and self-directed life-long learning skills (CEO Forum on Education and Technology, 2001; Haertel & Means, 2000; North Central Regional Educational Laboratory & Metiri Group, 2003). In support of the 21st century skills, the committee advocates that assessment be aligned with a range of skills: "Assessments must tap a broader range of competencies than in the past. They must capture the more complex skills and deeper content knowledge reflected in new expectations for learning" (National Research Council, 2001, pp. 22–23). It is therefore vital state-level tests align to content standards and that school and classroom curriculum and assessments align to the same standards as well.

One of the most recent and influential document that determines the knowledge and skills each student should learn for school and the workforce is a report published by The Partnership for 21st Century Skills, a business-education-policymaker organization, titled Learning for the 21st Century. The report articulates a collective vision of 21st century learning and provides an assessment tool that can help determine those skills. To help stakeholders create a 21st century learning environment, the Partnership has released several support documents, The Road to the 21st Century: A Policymaker's Guide to the 21st Century Skills, Route 21: An Interactive Guide to 21st Century Learning, and Milestones for Improving Learning and Education Guide for 21st Century Skills. This set of comprehensive tools will help stakeholders in education integrate 21st century learning skills into schools and beyond.

Trends in assessment reform have pushed some states to explore alternative assessments options and implement standards-based curriculum at the state level. Whether states do or do not have content standards articulated, under the NCLB legislation all states and school districts are required to gather data about student achievement over time and to hold schools, teachers, and students accountable for meeting curricular standards and for making AYP.

Summative Versus Formative Assessment. It is critical to distinguish between the functions and purpose of both summative and formative assessments.

Photo of Gilbert Valdez Dr. Valdez explains, they are both important yet for different purposes and intents. [Video: :27]

Assessment becomes formative "when the evidence is actually used to adapt the teaching to meet student needs" (Black & Wiliam, 1998b). It stands in contrast to summative assessment, which generally takes place after a period of instruction and requires making a judgment about the learning that has occurred (e.g., by grading or scoring a test or paper). In other words, while summative assessment is used for accountability to showcase and prove that students gained knowledge over a certain time, formative assessment is valuable in its intent to improve learning and to change instruction based on results. In fact, research shows that formative assessment has one of the biggest effects on learning, even equal to the effect of parental influence.

Photo of Gilbert Valdez Dr. Valdez further discusses the two functions of assessment. [Video: 1:20]

Certain uses of achievement test results are termed "high stakes" if they carry serious consequences for students or for educators. High schoolwide scores may bring public praise or financial rewards, and low scores may bring public embarrassment and heavy sanctions. For individual students, high scores may bring a diploma attesting to exceptional academic accomplishment, while low scores may result in students being held back in grade or denied a high school diploma and, consequently, higher education opportunities.

The mandate to track students' yearly progress in academic subjects suggests a place for growth models of assessment. For growth to be seen, assessment cannot be sporadic. As Seltzer, Choi, and Thum (2002) explain it:

"… Much can be learned by moving beyond snapshots of student achievement at single points in time to analyses and summaries of student growth. To be sure, the notion of growth in knowledge and skills lies at the heart of definitions of learning and education" (p. 2).

In their Longitudinal Study of American Youth, Seltzer, Choi, and Thum (2002) demonstrate the use of within-student growth and between-student growth models of assessment, in which they captured time series data for students in several schools. Within-student growth models enable teachers to understand students' initial status in relationship to a set of content standards and formatively assess their rate of change toward increasing levels of competency. Using classroom-based assessments designed for the purpose of identifying students' initial status and growth over time can enhance teachers' ability to provide equitable opportunities to learn for students with varying instructional needs. On the other hand, between-student growth models enable schools, districts, and grade levels to estimate the "mean rate of change for a group of students, assess the extent to which students vary in their rates of change, and identify important correlates of change" (Seltzer, Choi, & Thum, 2002, p. 6).

National standards have been developed in science and mathematics by the National Research Council and the National Council of Teachers of Mathematics, respectively.

Formative Assessment

Accountability for schooling often focuses teachers' attention on basic-skills test scores, leaving little time for promoting deep understanding, inquiry, or problem solving in the classroom. Yet it is important to consider that during the course of a year, teachers can build in many opportunities to assess how students are achieving in relation to AYP requirements and then use this information to make beneficial changes in instruction. Ideally, most teachers would like to find ways to balance both, even though assessment requirements can be overwhelming.

Teachers and students alike need to know what students are learning. When teachers know how students are progressing and where students are having trouble, they can use this information to make necessary instructional adjustments such as reteaching, trying alternative instructional approaches, or offering more opportunities for practice. These activities can lead to improved student success. The systematic and regular measurement of students' progress that occurs at the classroom level and the process which allows use of test results to shape instruction is the basis of formative assessment (Barchfeld-Venet, 2005).

According to Braun (2001), classroom-based assessment has three main functions. From the perspective of individual students, these functions are to choose, learn, and qualify. To "choose" means to use assessment data to help inform the selection of an appropriate course of study. To "learn" refers to using assessment data as feedback about how well one is learning. To "qualify" refers to using assessment data to certify an accomplishment. From a program perspective, assessment data are typically used to "place" a student into an appropriate course of study, to "monitor" the student's learning progress, and to "report" the accomplishments of the students in the program or school (Braun, 2001).

Along with definitions of the three functions of assessment, Braun (2001) also explains the importance of goal defining when selecting or designing classroom-based assessments: "The success of an assessment or an evaluation depends in large measure on being clear about the goals and then engaging in a disciplined design and development process that is focused on these goals." Much can be accomplished with such assessments that both support students' learning and inform teachers' instruction. That is why teachers may value classroom-based assessments and make better use of them than of the high-stakes standardized tests.

Regardless of the type of assessment, it is important to ensure alignment across content standards, opportunities to learn in the classroom, and the achievement measures –both published and teacher-created—used to evaluate progress toward proficiency in the disciplines. Providing students opportunities to learn especially is critical because the content calls for differentiated instruction to meet individual learner needs.

In addition, providing students with differentiated instruction to meet multiple individual needs requires educators to measure student learning through effective assessment. Foertsch (1999) defines 10 major underlying principles of effective testing and assessment in the classroom, including any achievement measure:

  1. Clearly define what you will assess. What do you expect your students to be able to do?
  2. Define the purpose of your assessment. Are you intending to conduct placement, formative, diagnostic, or summative assessment?
  3. Select or develop assessment procedures that closely match targeted learning goals or abilities. Are the tasks accurately reflecting the learning goals you wish to assess? Are the formats of tests and items affecting student responses? What irrelevant factors do interfere with assessment? Does the test contain appropriate sampling of test or item difficulty?
  4. Know the limitations of assessment procedures used. Does the way the test is developed, administered, or interpreted present any limitations?
  5. Use a variety of assessment procedures. Is your assessment comprehensive (e.g., do you make use of observations, class work, professional judgment, student and parent input?)
  6. Evaluate the assessment or test you develop or use. Are they valid and reliable?
  7. Communicate assessment results clearly to all users. Do the students, parents, teachers, and other stakeholders understand the results?
  8. Consider and address personal implications. Are your biases influencing your professional judgment?
  9. Strengthen the link between assessment and learning. Is assessment helping improve instruction and learning?
  10. Assessment should serve a useful purpose and not be an end in itself. Are tests an integral part of your instruction and are they helping you guide instruction?

The 10 principles offer teachers guidance on how to create, use, and interpret classroom tests purposefully and judiciously. NCLB mandates for meeting opportunity-to-learn requirements reinforce the principle of fair testing. In the classroom, fair testing practices hold true for both formative and summative tests of all kinds. Assessment experts advocate that test content incorporate what is taught through the curriculum, materials, and instruction, and that students have opportunities to really learn before high-stakes consequences are imposed for failing any test (American Educational Research Association, 2000).

As a result of NCLB, academic standards are being translated into goals for AYP. Benchmarks aligned to state-level AYP requirements can provide the criteria for formative assessment objectives that identify students' level of progress throughout the school year. In science, teachers can use benchmarks that describe cognitive assessment objectives important to the science curriculum and then identify assessment tasks students can perform to provide evidence of their benchmark competence. Teachers and administrators can use progress mapping to track students' learning of science benchmarks, to provide students with feedback, and to set future learning goals. Dependence on one type of assessment does not provide a comprehensive view of student learning progress (Wright, 2001). "To be comprehensive, assessment must be authentic, meaning it resembles the classroom experience or 'real-life experience'" (Wright, 2001, p. 61). Use of classroom-based assessments that provide feedback about the quality of responses students generate for open-ended, real-world tasks and that go beyond basic-skills testing will help schools avoid the pitfalls of narrowing the curriculum as a response to policy pressures surrounding students' standardized basic test scores.

In recent years, there has been increasing interest in finding methods of assessment that go beyond the traditional ones. The need for teachers to adapt the teaching and to know what students are learning requires innovative methods of assessment. In science and mathematics, quizzes, tests, and teacher evaluations of homework, reports, and projects are often used to assess students' progress. Knowing that such traditional assessment devices do not accomplish everything, other assessment methods also should be used, such as concept mapping or creating science portfolios (Hickey, Kindfield, Horwitz, & Christie, 2003; Atkin, Black, & Coffey, 2001). Portfolios, or collections of student work, can be used formatively if students and teachers annotate the entries and observe growth over time and practice (Duschl & Gitomer, 1997).

In science education, concept mapping has been widely recommended and used in a variety of ways to observe change in students' understanding of concepts over time, to assess what the learner knows, and to reveal their unique thought processes. Anderson-Inman, Ditson, and Ditson (1999) cite considerable evidence that concept mapping promotes meaningful learning in science. Formative feedback through the means of concept mapping and science portfolios, for example, form a significant basis for success—both for students and for teachers.

In general, well-designed formative assessments positively affect a student's role, motivation, and self-perception, which allow students to view assessment as supportive rather than punitive (Sadler, 1989; Barchfeld-Venet, 2005). Studies indicate that evaluation and reflection involving analysis and feedback are important aspects of effective teaching (McAninch, 1993). Performance feedback benefits teachers and students (Bransford, Brown, & Cocking, 1999; Pellegrino, Chudowsky, & Glaser, 2001) by doing the following:

The potential benefits of formative assessment are so significant that teachers cannot ignore this assistance to students. Black and Wiliam (1998a) conducted an extensive research review of 250 journal articles and book chapters winnowed from a much-larger pool to determine whether formative assessment raises academic standards in the classroom. They concluded that efforts to strengthen formative assessment produce significant learning gains as measured by comparing the average improvements in the test scores of the students involved in the innovation with the range of scores found for typical groups of students on the same tests. Black and Wiliam (1998b) found effect size between .4 and .7, with the greatest impact for low-achieving students: "… Improved formative assessment helps low achievers more than other students and so reduces the range of achievement while raising achievement overall." They recommend setting up local groups to tackle formative assessment at the school level while collaborating with other local schools. In science or mathematics, teacher groups can collaboratively design classroom-based assessments that will help them understand their students' progress toward the core subject concepts and skills, and plan their instruction accordingly.

Photo of Gilbert Valdez Collaborative work on assessment is one of the greatest forms of professional development, according to Dr. Valdez. [Video: :53]

While feedback generally originates from a teacher, students also can play an important role in formative assessment through self-evaluation. Two experimental research studies have shown that students who understand the learning objectives and assessment criteria and have opportunities to reflect on their work show greater improvement than those who do not (Fontana & Fernandes, 1994; Frederiksen & White, 1997). Students with learning disabilities who are taught to use self-monitoring strategies related to their understanding of reading and writing tasks also show performance gains (McCurdy & Shapiro, 1992; Sawyer, Graham, & Harris, 1992). Another student group that is commonly underrepresented is gifted learners, both identified and unidentified. Frequently, gifted students are misidentified as struggling learners because of an inappropriate assessment measure, when in fact they are not.

Photo of Gilbert Valdez As a result, such students end up being bored and having high drop-out rates, Dr. Valdez comments. [Video: :54]

Since the goal of formative assessment is to gain an understanding of what students know and don't know, Black and Wiliam (1998b) encourage teachers to use questioning and classroom discussion as an opportunity to increase the knowledge of all students and improve their understanding. They caution, however, that teachers need to be sure to ask thoughtful, reflective questions rather than simple, factual ones and then give students adequate time to respond. In order to involve everyone, they suggest strategies such as the following:

Teachers might also assess students' understanding in the following ways:

In addition to these classroom techniques, tests and homework can be used formatively if teachers analyze where students are in their learning and provide specific, focused feedback regarding performance and ways to improve it. Black and Wiliam (1998b) make the following recommendations:

Emerging Assessment Concerns and Effect Factors

One of the biggest concerns in assessment is the validity and reliability of measures as well as instruments of those measures. The terms "test validity" and "test reliability" are often used but seldom understood by teachers.

Photo of Gilbert Valdez Dr. Valdez explains, the two terms have very specific meanings in statistics and are commonly misused in common language. [Video: 1:30]

Reliability is basically: Can that instrument be used reliably in other settings by different people? In other words, if you go to a different classroom and I go to a different classroom and the students are "knowledge," would the fact that I'm administrating the test versus you give you a different thing? If a test shows the same results with different people administering it in different settings, then it's reliable. So the concept of interrelated reliability that you often hear implies that regardless of who is giving the test, it's got reliability between the settings. So validity is basically about: Is a test doing what it's intended to do? And reliability is: Will it do, be consistent, across different settings and different people administering the test?

Russell and Haney (1997) studied the effects of test administration mode to see whether tests administered on computer versus paper-and-pencil have an effect on student performance on multiple-choice and written test questions. The study found the effect of responses written on computer are significantly higher than those written by hand: "The size of the effects was found to be 0.94 on the extended writing task and .99 and 1.25 for the NAEP language arts and science short answer items. Effect sizes of this magnitude are unusually large and of sufficient size to be of not just statistical, but also practical significance (Cohen, 1977; Wolf, 1986)" (Russell & Haney, 1997). Such findings suggest that while the medium of instruction is important, the method of assessing student knowledge is critical. Russell and Haney (1997) concluded, "As more and more students in schools and colleges do their work with spreadsheets and word processors, the traditional paper-and-pencil modes of assessment may fail to measure what they have learned." Studies like this one suggest that more of today's paper-based assessments are becoming a thing of the past for various reasons, including outmoded test designs, mismatches between standards-based curriculum and assessments, differences in population groups' performance, a lack of feedback to help students improve, or inefficiency in administration, analysis, and reporting (Bennett, 2000).

In a later study, Russell (2000) claims that technologies used during learning activities also should be used during testing. He contends that student assessment methods should match the medium in which students typically work and advocates for state and local assessment programs to allow students the same technology assistance in the assessment process as they get in the learning process.

To take this argument further, Rose and Meyer (2002) explain that selecting appropriate testing accommodations for students on an individualized basis is a very complex endeavor involving the following factors that can confound results of traditional academic assessments:

Media characteristics of the technology used to administer the test also can present factors confounding test results when students are more apt in one medium (i.e., paper-and-pencil tests) than another medium (i.e., computerized tests that require keyboarding skills). This is a concern that will continue to grow as more assessments become technology based. Technology is a powerful tool not only in instruction but in assessment, hence it should be equally used in both, says Dr. Valdez:

Photo of Gilbert Valdez Technology is a powerful tool not only in instruction but in assessment, hence it should be equally used in both, says Dr. Valdez. [Video: 1:20]

Bennett (2003) explains that as technology has become ever central to schooling, assessing students via technology-based methods will be increasingly required. Education leaders in several states and numerous school districts already are implementing technology-based tests for low- and high-stakes decisions in elementary and secondary schools in all key content areas. Most of these tests are computer-adaptive tests using multiple-choice items such as the one North Carolina piloted during the 2000–01 school year. The North Carolina Computerized Adaptive Testing System (NCCATS) pilot evaluated the feasibility, validity, and reliability of a computerized adaptive testing system to be used as an accommodation for students with disabilities to respond to multiple-choice test questions for the North Carolina End-of-Grade Tests of Reading and Mathematics in Grades 3–8 and the North Carolina High School Comprehensive Test in Grade 10. The NCCATS is one state initiative among eight projects in K–12 technology-based assessments under way at the state level of education, according to Bennett (2003).

Recent research into computer applications for assessing students' writing, such as automated essay scoring methods, has achieved a reliable level of agreement with a human rater's essay score, which has far-reaching implications for assessments that include open-ended responses (Burstein, Marcu, Andreyev, & Chodorow, 2001; Foltz, Gilliam, & Kendall, 2000; Kintsch, Steinhart, Stahl, & LSA Research Group, 2000). Another area of assessment research involves artificial neural networks to generate performance models of students' complex problem solving during performance of simulated science tasks that do not have predetermined solution paths (Vendlinski & Stevens, 2003). According to Bennett (2003), state efforts will need to go beyond the initial achievement of computerizing traditional multiple-choice tests to create assessments that facilitate learning and instruction in ways that paper measures cannot.

Besides the method of test administration, assessment also is influenced by varying approaches to curriculum and instruction. Different approaches to curriculum and instruction inherently carry assumptions about underlying learning theory and teaching philosophy that influence the types of assessments needed for alignment (Shepard, 2000). Bennett (2000) points out that tests also are affected by factors or cognitive constructs that include knowledge organization, problem representation, mental models, and automaticity, and which are not accounted for explicitly by many tests.


Teachers and administrators should do the following:




There are some common implementation pitfalls that schools encounter when trying to improve their use of assessments for decision making and providing timely and ongoing feedback to students about their progress, strengths, and areas for improvement. Some of these barriers that teachers face can include the lack of time and limited assessment literacy skills, delayed access to student test data, assessment that loses its meaning in the real world, or curriculum that is being taught to the test.

First of all, teachers may not have the time or assessment-literacy skills to create classroom assessments that generate reliable student progress data to inform their lesson planning. Assessment-literacy skills include skills to create assessments, analyze and reflect on student progress data as well as provide meaningful feedback to learners. Also, teachers may not have timely access to relevant student data, both progress data and summative data, from high-stakes tests.

The main pitfall, however, is teaching to the test. Many states are undergoing the process of designing or identifying new tests that align with content-area standards as part of their efforts to implement their NCLB plan, which provides an opportunity to make high-stakes tests more meaningful. However, it is quite common for schools to focus on tests so intensely that the tendency then is to teach students to take those tests rather than learn beyond the test. As Reich (2001) warns:

"They're best at measuring the ability to regurgitate facts and apply standard modes of analyses. … [But] it's far more important to learn how to identify and solve new problems, think critically, and challenge assumptions. … Standardized tests can help measure whether children have achieved an adequate level of communication skills and numeracy, and even help pinpoint where children need more guidance. … It is training a generation of young people to become exquisitely competent at taking standardized tests, and a generation of teachers to become exceedingly good at teaching how to take them. Neither of these competences has much to do with preparing young people for what they will encounter when they leave our schools."

Furthermore, limiting the curriculum to teach to a test that only covers basic scientific facts and mathematical formulas because of short class periods, lack of lesson planning time, or history of low student performance on high-stakes tests violates existing guidelines for fair use of tests.

"Any narrowing of the curriculum, along with the confusion of training to pass a test with broader notions of learning and education are especially problematic side effects of high-stakes testing for low-income students. The poor, more than their advantaged peers, need not only the skills that training provides but need the more important benefits of learning and education that allow for full economic and social integration in our society" (Amrein & Berliner, 2002).

More importantly, such limitation on curriculum delivery has become a major concern not only for teachers but also for parents. They fear their children are spending time to learn facts and complete multiple drills, "rather than spending time on problem solving and the development of critical and analytical thinking skills. Teachers at the grade levels at which the test is given are particularly vulnerable to the pressure of teaching to the test" (Domenech, 2000).

A recent National Academy of Science/National Research Council report on school learning makes clear that schooling that too closely resembles training, as in preparation for testing, cannot accomplish the task the nation has set for itself, namely, the development of adaptive and educated citizens for the new millennium (Heubert & Hauser, 1999). Some researchers even question:

Is there "evidence of student learning, beyond the training that prepared them for the tests they take, in those states that depend on high-stakes tests to improve student achievement? … Although states may demonstrate increases in scores on their own high-stakes tests, it appears that transfer of learning is not a typical outcome of their high-stakes testing policy" (Amrein & Berliner, 2002).

Such transfer of learning, however, may not necessarily happen in hands-on learning experiences. So if teachers have been teaching science using a hands-on, discovery-oriented approach, they may have good reason to be anxious when the students in their class are expected to display a command of science on standardized tests. After all, teaching science in a hands-on fashion may not provide the background in science knowledge that students from more traditional, textbook-oriented classrooms have. On the other hand, the students in a discovery-oriented class are very likely to acquire many science process skills and develop a favorable attitude toward science.


When Amrein and Berliner (2002) examined 18 states to see if their high-stakes testing programs were affecting student learning, they summarized arguments (pp. 4–5) highlighted in research studies supporting high-stakes testing as follows:

Amrein and Berliner (2002) examined the validity of these statements through both quantitative and qualitative research as well as by interviewing teachers who work in high-stakes testing environments. They came to a conclusion that "these statements are true only some of the time, or for only a modest percent of the individuals who were studied" (p. 5) and mentioned the stress factor as one of the reasons causing such varied outcomes.

Stiggins (1999) describes the pressure from high-stakes tests and its rationale as "the way to spur greater effort … through intimidation by means of the threat of dire consequences for low test scores" (p. 192). Some school districts have adopted this attitude in order to motivate low-performing students to achieve. However, there are other schools that think it is actually possible to make better use of standardized science tests with less anxiety.

Learning the basics is important for future learning; however, methods for learning the basics can be embedded in lessons that foster deep content-area learning. Students should understand that the high-stakes tests will not measure all that they have learned. If, for example, your class has carried out a hands-on unit on the use and waste of water in your school, explain to the students that they should not expect to see questions about it on their tests. Point out that some of the science units they have studied have given them a lot of information that will not be measured but will still matter a lot. Emphasize that they should not feel bad if many of the things they have learned are not on the test.

To further reduce student anxiety regarding high-stakes tests, teachers can take some class time before the test to teach basic standardized test taking strategies. The students will likely take many standardized tests during their school years and when they pursue employment. Investing time and energy to improve test-taking skills may annoy teachers because of their own feelings about testing, but it may help students become more successful test takers and not be intimidated by the test itself.


The following links provide additional information on examples of formative and summative assessment:

Additional Links

Assessment Ideas for the Elementary Science Classroom

This Web site deals specifically with the needs of elementary and/or middle-level science teachers. In addition to discussing conferences, interviews, contracts, and portfolios, this site provides templates or guide sheets for creating and using various assessment techniques.

Assessment Standards for School Mathematics

This is the third book in the National Council of Teachers of Mathematics Standards series. It has been developed as a guide for examining current assessment practices and planning new assessment systems.

Classroom assessment and the National Science Education Standards

This 2001 book for prospective and practicing teachers, designed as a companion to National Science Education Standards, discusses how formal and informal assessments can guide and improve pedagogy.

Consumer Guide: Performance Assessment

This Consumer Guide, sponsored by the U.S. Department of Education, is an archived publication for teachers, parents, and others who are interested in alternative techniques for student assessment. It not only discusses techniques but also lists the addresses of organizations that can provide additional information about the topic.

Exemplars: Standards-Based Assessment and Instruction

This Internet site offers differentiated performance assessment tasks that meet the national standards for science and mathematics and are designed to improve both assessment and instruction within these disciplines. All of the performance tasks include teaching tips, suggestions for use, rubrics, and a list of possible solutions.

Family Education Network: Authentic Assessment

This Web site provides techniques and resources for authentic assessment.

Math Assessment Techniques on PBS Teacher Source

The PBS Teacher Source section on mathematics has a selection of mathematics techniques that teachers will find helpful and practical. The techniques include questioning, conferencing, interviewing, self-assessing, and multiple-choice testing.

National Center for Education Statistics: The Nation's Report Card

The Nation's Report Card is a continuing assessment of what American students know and can do in a variety of subjects, including science and mathematics. The testing program itself is called the National Assessment of Educational Progress (NAEP). It samples student achievement at Grades 4, 8, and 12.

Classroom Assessment and the National Science Education Standards

The National Research Councilhas produced a useful, accessible book on classroom assessment in science that contains many vignettes about how teachers can adjust their teaching based on their observations, questioning, and analysis of student work. While the anecdotes are specific to K–12 science teaching, the chapters have broad applicability to the documented value of formative assessment on classroom achievement as well as what it requires in terms of teacher development and how classroom assessment relates to summative assessment such as state tests.

National Science Education Standards

This Web site contains the complete text of the National Science Education Standards, which are premised on a conviction that all students deserve and must have the opportunity to become scientifically literate.

No Child Left Behind Act

The reauthorized Elementary and Secondary Education Act focuses on four ideals: stronger accountability for results, more freedom for states and communities, encouraging proven education methods, and more choices for parents.

Northwest Regional Educational Laboratory

The It's Just Good Teaching series focuses on using assessment to inform and improve instruction in mathematics and science. The training kit, Toolkit98, offers readings, overheads, exercises, and handouts to help groups of teachers think through assessment issues in their schools.

Principles and Standards for School Mathematics

The National Council of Teachers of Mathematics offers a list of standards by grade and a summary of principles.

Project 2061

The focus of Project 2061 is to foster scientific literacy. It is a reform initiative developed by the American Association for the Advancement of Science, which seeks to improve the quality, increase the relevance, and broaden the availability of science, mathematics, and technology education. The entire text of Benchmarks for Science Literacy is available online. The publication expands the science-literacy goals in Project 2061's publication Science for All Americans into specific goals that should be achieved by the end of Grades 2, 5, 8, and 12.

Project Zero Research Projects

The Project Zero Web site features Assessment Projects and Current Research Projects which offer alternative ways of assessing student progress in several disciplines, including science. Most of the projects have an interdisciplinary focus.

The Heart of the Matter: Using Standards and Assessment to Learn

This professional development book presents strategies for using standards and assessments to support meaningful learning for all students. It explores issues related to assessment, accountability, and standards-based reform in the context of teaching for understanding and critical inquiry.

Using Data, Getting Results: A Practical Guide for School Improvement in Mathematics and Science

This resource guidebook with CD-ROM is designed to help teachers, administrators, and community members institute reform in school science and mathematics curricula with a focus on improving student learning.

Western States Benchmarking Consortium

The Consortium's strategic framework can help districts conduct self-assessments in four strategic areas: student learning, data-driven decision making, capacity development, and community connectedness.


American Association for the Advancement of Science (AAAS)
1200 New York Avenue NW
Washington, DC 20005
Phone: 202-326-6400

American Educational Research Association (AERA)
1230 Seventeenth Street NW
Washington , D.C. 20036-3078
Phone: 202-223-9485

Educational Testing Service (ETS)
Rosedale Road
Princeton, NJ 08541
Phone: 609-921-9000

National Assessment of Educational Progress (NAEP)
National Center for Education Statistics
1990 K Street, NW
Washington , DC 20006
Phone: 202-502-7300

National Association for State Boards of Education (NASBE)
277 South Washington Street, Suite 100
Alexandria , VA 22314
Phone: 703-684-4000

National Center for Research on Evaluation, Standards, and Student Testing (CRESST)
300 Charles E. Young Drive North
GSE&IS Building Third Floor/Mailbox 951522
Los Angeles, CA 90095-1522
Phone: 310-206-1532

National Council of Teachers of Mathematics (NCTM)
1906 Association Drive
Reston, VA 20191-1502
Phone: 703-620-9840

National Science Foundation (NSF)
4201 Wilson Boulevard
Arlington , VA 22230
Phone: 703-292-5111

National Science Teachers Association (NSTA)
1840 Wilson Boulevard
Arlington VA 22201-3000
Phone: 703-243-7100


Date posted: 2005
Copyright © North Central Regional Educational Laboratory. All rights reserved.
Disclaimer and copyright information.