Should “performance-based” tests be mandated?*

By Sandra Stotsky

Published December 15, 2020

Parents are being told that the first big issue for a new Secretary of Education in the Biden administration is deciding whether schools should test or not test students in 2020. However, a much bigger issue is whether schools should continue to give Common Core-aligned tests (once national testing using Common Core-aligned tests resumes, as it is likely to do, in the absence of another form of accountability for USED and the states to use) or switch to “performance-based” tests.

Many educators who think they are “reformers” would like performance-based tests. They are encouraging other educators to “rethink” accountability “by replacing high-stakes exams with performance-based assessments.” So, if Congress decides to eliminate the tests now given (which are all aligned to Common Core’s standards), schools will still have to give tests based on “college-ready” standards because they promised to do so in ESSA (Every Student Succeeds Act), a re-authorization of the Elementary and Secondary Act passed by Congress in December 2015—and most non-state standards for K-12 described as “college-ready” are already based on Common Core’s. Any new tests that Congress or USED may mandate that states use in place of the “college-ready” tests they are now giving will have to be based on the same “college-ready” standards that state Departments of Education adopted when they signed onto a four-year State Plan for ESSA after 2015 (with approval needed only by USED itself) in order to get Title I money. The tests may even be called “performance-based,” and parents and teachers will likely consider them better than the Common Core-aligned “standardized” tests now being used.

Educators and parents were told before 2010 that “equity” was one reason for adopting Common Core’s standards. How else could the performance of low achievers be compared to the performance of other students in this country if they didn’t all take the same test? How could teachers know who the low achievers really were? Actually, their low performance was well known to all teachers. All a teacher had to do was look at their writing. That is still all they have to do. Writing has always been related to reading level.

Parents and state and local school boards were also told that addressing Common Core’s standards in their curriculum would make all students college-ready. But, not only has this aspiration not happened, the “gap” between the lowest and highest achievers has also widened. Test-based accountability would close the gaps, we were told, but even in Finland, most high school graduates do not go to a university. Vocational education is more common than a university education.

As a prominent Finnish educator Pasi Sahlberg tells us (p. 25), Finnish teachers regularly test students in the upper secondary school five or six times per subject per school year, using teacher-made tests. There are also “matriculation” tests at the end of high school required for students who want to go to a university. Indeed, there are lots of tests for Finnish students, just not mandated tests constructed by testing companies for the elementary grades—the grades where American students are heavily tested.

Why should Americans now be even more interested in the topic of testing than ever before? Who wouldn’t look favorably, for accountability for federal money, at a test that “accurately measures one or more specific course standards”? And is also “complex, authentic, process and/or product-oriented, and open-ended,” according to Patricia Hilliard in her blog on performance-based assessments.

Two states have already found out there are deep problems with performance-based assessments in the form of portfolios: Vermont and Kentucky. An old government publication (1993) warned readers about some of the problems with portfolios—a commonly used form of performance-based assessment: “Users need to pay close attention to technical and equity issues to ensure that the assessments are fair to all students“. It turns out that portfolios are not good for high stakes assessment. In a nutshell, they are costly, time-consuming, and unreliable. Quoting one of the researchers/evaluators in the Vermont initiative, the 1993 publication indicates: “The Vermont experience demonstrates the need to set realistic expectations for the short-term success of performance-assessment programs and to acknowledge the large costs of these programs.” The researchers who have been quoted state in their own blog that they “found the reliability of the scoring by teachers to be very low in both subjects… Disagreement among scorers alone accounts for much of the variance in scores and therefore invalidates any comparisons of scores.” The researchers emphasized the lack of quality data in these tests in another government publication.

Commenting on the “failed accountability system” in Kentucky after years of “reform,” education professor George Cunningham observed:

Historically, the purpose of instruction in this country has been increasing student academic achievement. This is not the purpose of progressive education, which prefers to be judged by standards other than student academic performance. The Kentucky reform presents a paradox, a system structured to require increasing levels of academic performance while supporting a set of instructional methods that are hostile to the idea of increased academic performance (pp. 264-65).

That is still the dilemma today—skills-oriented standards assessed by “standardized” tests that require some multiple-choice questions for the sake of a reliable assessment. Congress and teachers still want students to show whether or not they know something.

Cunningham also warned about using performance assessments for large-scale assessment (p. 288). Concluding that “Performance Events were expensive and presented many logistical headaches,” he noted:

The biggest problem with using performance assessments in a standards-based accountability system, other than poor reliability, is the impossibility of equating forms longitudinally from year to year or horizontally with other forms of assessment. In Kentucky, because of the amount of time required, each student participated in only one performance assessment task. As a result, items could never be reused from year to year because of the likelihood that students would remember the tasks and their responses. This made equating almost impossible.

Further details on the problems of equating “Performance Events” appear in a technical review in January 1998 by James Catterall and four others for the Commonwealth of Kentucky Legislative Research Commission, and in a 1995 analysis of Kentucky’s tests by Ronald Hambleton et al.), while Richard Innes at Kentucky’s Bluegrass Institute gives a slightly optimistic account of what could be learned from the attempt to use writing and mathematics portfolios. For more articles on the costs and benefits of student testing. ;Cost of Standardized Student Testing ;Dismissive Reviews and Citation Cartels; and Dismissive Reviews in Education Policy Research

Concluding Remarks:

The main point of switching to highly subjective performance-based assessments seems to be that they remove any urgent need for content-based questions. That was why the planning documents for teacher licensure tests in Massachusetts (which were required by the Massachusetts Education Reform Act of 1993) specified more multiple-choice questions than essay questions on content (the tests all included both) and, for their construction, revision, and approval, required content experts as well as practicing teachers with that license, together with education school faculty who taught methods courses (pedagogy) for that license. With the help of the National Evaluation Systems (NES), the Bay State’s teacher licensure test developer, the state was able to get more content experts involved in the test approval process. What Pearson, the British company that co-owns these tests, has done since its purchase of NES over ten years ago is unknown. Education researchers rarely analyze the content of teacher licensure tests, and the test developers may not tell the states exactly what changes they made to a previous test.

For example, Common Core’s beginning reading standards were added to the test description for the Foundations of Reading (90), a well-known licensure test developed in 2001 in Massachusetts for prospective teachers of young children. Sample test items for assessing these standards were also added to the original NES Practice Test for prospective teachers. We don’t know what changes, if any, were made to that licensure test (used today by six other states to assess an aspiring teacher’s knowledge of beginning reading issues or research-based strategies) or to Common Core-aligned licensure tests for mathematics. Even if Common Core’s standards have actually been revised or eliminated, their negative influence may remain in some of the licensure tests that states had already developed, revised, or adopted.

It is time for state legislators to scrutinize the costs and benefits of “performance-based assessments” before agreeing with those who want to eliminate standardized tests on the grounds that performance-based tests give teachers a better understanding of what students have learned. Those making this argument may want only to get rid of the “standardized” tests they are using. But not only are performance tests apt to be more expensive and time-consuming to give, they are apt to be less informative to parents and classroom teachers.