|
How is student achievement
measured?
In education, the familiar metrics used in the reporting of student
performance on standardized tests of achievement tests of achievement
include percentile
rank, grade
equivalent, normal
curve equivalents and
scale scores.
Normal curve equivalent scores and scale scores are the only types
of measures that should be used to measure change in individual
student, classroom, or school performance. The other measures are
either inappropriate for use in statistical analyses or inappropriate
for measuring change because of the way they are scaled.
Also take note of whether the study reports a calculation of the
size of the improvement in achievement in terms of an effect
size. Since different studies tend to use different measures
of achievement, it is often difficult to compare the relative effectiveness
of different software packages or the same software package across
different studies. Once achievement gains are converted into an
effect size, the effect size can be used to compare the relative
effectiveness of a software package. Effect sizes also give us a
sense of whether a gain in achievement is important (i.e., is it
big or small). As a quick rule of thumb, an effect size of 0.30
or greater is considered to be important in studies of educational
programs. It is also important to know how an effect size compares
with more familiar metrics of learning. For example, an effect size
of 0.1 is equivalent to about one month of learning gain. Also,
effect size needs to be interpreted in practical terms. A small
effect size may be of important practical significance if the intervention
is relatively inexpensive compared to competing options, if the
effect occurs among all groups of students, and if the effect accumulates
over time.
The test that is used by researchers to measure student performance
in a study may affect the magnitude of the effect size that is estimated.
Researchers have found that the use of "local" tests,
specifically developed to measure how students perform on tasks
closely aligned with the content of the software, result in larger
effect sizes than when more common standardized tests are used.
Somtimes researchers measure technology's effectiveness by using
tests that fit the specific technology program's goals so narrowly
that they do not reflect more common and familiar academic outcomes.
Ideally, researchers use tests that have been validated for use
across more than one program but that are also sensitive to the
kinds of things students might be expected to learn, given the software's
design.
|