by Ullik Rouk

Assessment

Achieving high standards is the essence of accountability. To measure how well schools and students are meeting high standards, states have developed new assessment systems, or are refining existing systems, to align with the standards. Student scores on these assessments have become the number one indicator of district, school, teacher, and student achievement.

Most states use a mix of tools to measure student performance, including norm-referenced tests, criterion-referenced tests, and performance assessments. They seek a balance, combining open-ended formats that ask students to "invent" solutions to problems with more traditional standardized, norm-referenced tests. At one time, many states dropped multiple-choice test items in favor of performance and portfolio assessments. Testing experts believe these assessments provide a more accurate picture of what students know and can do, but questions about the reliability of such tests have since caused some states, including Kentucky a leader in education accountability, to reintroduce multiple choice (Whitford & Jones, 1999). Others put some multiple choice back into assessments to reduce cost and time requirements.

According to Quality Counts, 48 states are administering statewide testing programs, and 37 said they incorporate "performance tasks" in their assessments. Among the 48, 41 have aligned their tests in at least one subject to standards. Quality Counts reports that 21 states have aligned their standards and tests in all four primary academic subjects.

In addition to using student test scores, states often gauge how well schools and districts are doing by looking at factors such as attendance and dropout rates. In Louisiana, for example, high school students’ scores on the state test account for 60 percent of a school’s score and scores on the national test account for 30 percent. The remaining 10 percent is determined by a school’s student attendance and dropout rates.

Controversy over High Stakes

No one disputes that testing has a place in state accountability systems. Yet, the nature of these tests has steeped them in controversy. Many tests are "high-stakes," meaning that they have significant consequences for students and schools that do not meet achievement expectations. For example, "high stakes" come into play when students are denied promotion or high school graduation because of their low performance on tests, or when a school is totally reorganized because of recurrent low test scores.

Advocates of testing insist that the objectivity of test results ends the uncertainty about what students know and don’t know. Local districts and schools can use test results to identify their instructional strengths and weaknesses, and make decisions about their instructional programs. Critics warn that high-stakes tests can distort and narrow the purpose of schooling to the quest for test scores (WestEd, 2000). High-stakes tests encourage teachers to focus solely on what is tested, obscure richer ways of judging schools, and place blame for ineffective teaching on students. Rather than using test scores to judge students and schools, some assessment experts recommend using test scores as one among many sources of information to answer the same questions about students and schools. They also argue that testing instruments and technology are not up to the demands that high-stakes accountability places upon them (Linn, 2000).

Assessment in the Region

In Arkansas. Over the next four years, Arkansas will phase in the Arkansas Comprehensive Testing, Assessment, and Accountability Program. The plan incorporates standards, professional development for teachers, and state tests for students in grades 4, 6, and 8. Beginning in 2003-04, schools will be assessed according to state test scores, attendance, graduation rates, school safety, and teacher qualifications.

In Louisiana. Fourth and eighth graders in Louisiana took for the first time this spring LEAP 21 (Leap into the 21st Century) —a new state test designed to end the promotion of students who are not academically ready to enter the next grade. Districts must offer summer school programs to students who fail, after which students have another opportunity to take the exam. Students in tenth and eleventh grades will also take the test beginning in 2001. Last fall, K-8 schools were rated on the basis of student performance on the exam, as well as on the Iowa Test of Basic Skills, student attendance, and dropout rates.

In New Mexico. The accountability system will go into effect during the 2000-01 academic year. The assessment portion of the system consists of the CTBS/Terra Nova Exam in mathematics, science, language arts, and social studies for grades 3 through 9. Students are also administered a writing examination in grades 4 and 6 and a reading assessment in grades 1 and 2. Students must master the tenth-grade standards in order to graduate from high school but have until their senior year to pass the exam.

In Oklahoma. The statewide assessment program, Oklahoma's Core Curriculum Tests, assesses fifth and sixth graders in mathematics, reading, science, U.S. history, geography, arts, and writing. Eleventh graders are also tested in geography. The core curriculum tests for eleventh-grade students were discontinued in 1999, and will be replaced with "end-of-instruction tests" starting next school year.

In Texas. Every spring, Texas students in grades 3 through 8 and in grade 10 take the Texas Assessment of Academic Skills (TAAS) exams. Beginning in 2002-03, ninth and eleventh grade students will take the exams as well. TAAS scores combine with attendance and graduation rates to give schools a state accountability rating. Texas also requires that students pass examinations in reading, writing, and mathematics in order to graduate from high school. These exams, which test students on content through the end of ninth grade, are first administered in tenth grade. Students must pass all three exams but retake only that which they failed.

Sources: Arkansas Comprehensive Testing, Assessment & Accountability Program, July 1999; LEAP for the 21st Century High Stakes Testing Policy, Louisiana, May 2000; Handbook: Statewide Student Assessment System. Information for Parents, Students, Teachers, and other School Personnel, New Mexico, June 1999; communication with staff of State Department of Education and Office of Accountability, Oklahoma; Quality Counts 2000, Education Week.

Even as high-stakes testing becomes integrated into the system, many parents, civil rights activists, and educators are questioning the wisdom of relying on test scores for such decisions as student promotion and high school graduation. Parents in one of Michigan’s most affluent school districts recently rebelled against a new high school proficiency test that they claimed did nothing but embarrass students bound for college. Arguing against the inflated value of one test and the loss of local control, they organized student boycotts, political lobbying, and lawsuits to resist the test. Such tensions are making policymakers listen, and sometimes change their plans.

States have come to understand that their use of a test must match the purpose for which it was designed.

Tough Decisions for Policymakers

Putting state assessment programs into place is filled with tough decisions, each one creating its own tensions for policymakers. Decisions have to be weighed carefully, both to ascertain their educational value and to gain public support around issues of the design and appropriate use of tests.

Most state decision makers have learned that no single measurement instrument can do all things well. Tests designed to hold schools publicly accountable for student achievement are not the same tests that identify weaknesses or guide instruction; neither can they be used to set improvement targets for schools and districts. States have come to understand that their use of a test must match the purpose for which it was designed. Consequently, they’ve had to decide what they want their assessment programs to do, and develop–or select– a range of assessment strategies accordingly. States have had to decide whether to develop their own assessments, designed specifically to address their own standards, or to rely on commercial assessments. Developing new tests that are aligned with standards is a major expense for states. States often have few resources available for this development. Purchasing tests may cost considerably less. However, while test publishers do try to align their test items with common elements in state standards, these tests are unlikely to align as closely as items in a test developed by the state.

Most state decisionmakers have learned that no single measurement instrument can do all things well.

States also have had to decide what to compare and how often to test students. Comparing one year’s fourth-graders against another’s may not provide a true picture of achievement because the test population is not the same. This was one of the most significant controversies in the implementation of Kentucky’s accountability system. State officials responded by spreading testing to more grades (Whitford & Jones, 2000). Some experts recommend annual testing at each grade level, arguing that annual testing localizes student performance to the most natural unit of accountability, the grade level or classroom. It also yields the most up-to-date information and limits the amount of data that is lost when students move to other schools and districts. While measuring individual student progress each year offers a more accurate assessment, this method is expensive and difficult to carry out among highly mobile student populations.

At the same time, states have had to decide whether to measure absolute performance or growth in performance. Some states, like Arkansas, recognize schools for both absolute levels of achievement and for growth. Louisiana schools, on the other hand, are given a growth target to reach within two years. In making these decisions, states have had to decide what is an acceptable level of performance and what constitutes satisfactory progress. Other questions to address include the following: Should the same rate of progress be expected all the time? How much growth is reasonable to expect? Should the same amount of growth be expected from schools that start at different achievement levels?

Finally, states have had to face the particularly prickly issue of whether to control for differences in student, family, and community characteristics across students. Some districts believe that controlling for differences in prior achievement and student, family, and community characteristics across schools "institutionalizes low expectations for poor, minority, low-achieving students" (Elmore, Abelmann, & Fuhrman, 1996). Others argue that using data on these characteristics

effectively would require collecting them for all students, increasing the data burden for districts, something only the largest districts may be prepared to handle. Most others generally have on hand only the limited administrative data that is available on students’ race, gender, eligibility for free or reduced-price lunches, special education, or limited English proficient (LEP) status.

Next Page: Public Reporting

Published in Insights on Educational Policy, Practice, and Research Number 11, August 2000, Tough Love: State Accountability Policies Push Student Achievement

Assessment

Controversy over High Stakes

Assessment in the Region

Tough Decisions for Policymakers

In This Edition

Additional Options