The goals of the No Child Left Behind (NCLB) Act are clear enough: Zero children left behind, with 100 percent proficiency in 12 years.
But while federal officials repeatedly affirm those goals, some states are undermining the clarity of the NCLB accountability framework by applying statistical tests that essentially say, “It all depends what you mean by zero, 100 percent, and any number in between.”
Levels of Confidence
Perhaps the greatest challenge to the law’s promises is emerging from the way states are using statistical tools to determine passing rates on state tests and reporting required benchmarks for Annual Yearly Progress. Ironically, by applying a higher level of “confidence,” the passing rates being promised to the U.S. Department of Education are not in fact the benchmarks being enforced in many states.
In Indiana, for example, state education officials lauded two schools in the town of Marion for having shown sufficient improvement that they were removed this year from the “needs improvement” list. But a quick review of the data reveals one of the schools, Center Elementary, had actually declined in performance–falling from 55 percent passing in 2001-2002 to 43 percent passing in 2002-2003.
To the average citizen, 43 is different from and less than 55; however, to a statistician, that judgement depends on the level of confidence attached to each number.
Indiana is far from alone in applying statistical qualifications to NCLB accountability reporting. A recent review by the Chicago Tribune found 35 states are using some form of statistical manipulation to establish performance expectations that are lower than those being reported to federal officials.
The primary culprit in these states is their focus on “margins of error” or “confidence intervals” that are being applied to state testing data. State officials argue such tools are necessary to help make certain schools are not punished for errors in the testing process or swings in individual student performances.
The practical outcome is that target performance levels, those reported to the U.S. Department of Education, are being lowered by amounts that depend on the size of each student group and the variability of test scores found in similar groups. Thus, for example, Indiana has determined from its own test data that only 40 percent of students in a group of 30 will be required to pass in order to satisfy the 60 percent pass rate promised to federal officials. In other words, 12 passing students will be counted as 18 in reports to federal officials.
The appropriateness of these actions is a subject for on-going debate among educators, statisticians, and other researchers. Most seem to agree state test results have some degree of variability that ought to be considered. Students performing at or near the state’s passing level could move above or below that cut-off in multiple administrations of the test.
Concerns Already Addressed
But several safeguards are already being used by states to address those concerns. For example, states will not be required to consider small groups of students where such variability could have the greatest impact. For most states, that minimum group size is 30. Also, several states are using the averages of two or three years of testing data, another method that will smooth out random variability.
Some observers have also suggested the flexibility is being applied in the wrong place. If individual test scores are where variability occurs, then states ought to consider that when they set individual pass rates–not use that variability as a reason for lowering the number of students required to exceed the passing bar. Still others point out that using confidence limits is appropriate only for sampling data, not for hard counts like those used in determining pass rates.
The issue raised by confidence limits is best understood by thinking about a public opinion poll. Data from such polls are typically reported with specific margins of error. For example, a poll might indicate a 55 percent approval rating for the President, with a margin of error of plus or minus four percentage points.
The margin of error numbers are determined by a statistical measure called a standard deviation; typically, pollsters use a range of plus or minus two standard deviations to produce a 95 percent “confidence level” on their outcome. In other words, returning to the example, the pollster is telling us there is a 95 percent confidence level that repeated polls would show an approval rating between 51 and 59 percent (55, plus or minus four).
Using a range of plus or minus three standard deviations produces a higher confidence level–99 percent–but a wider margin of error. At a 99 percent confidence level, the Presidential poll would show an approval rating between 49 and 61 (55, plus or minus six).
The 99 percent level produces such a wide margin of error that it is almost never used–except, according to the Chicago Tribune survey, by 13 states in the development of their state accountability plans. Thus, while touting such lofty goals as “99 percent confidence” in their accuracy, these states have actually created such wide confidence intervals that schools like Marion Center Elementary, whose pass rates fell by 12 percentage points, can be labeled as “improving.”
The actual margins of error created by each state’s particular statistical treatment will depend on a range of factors that make it impossible to predict what actual pass rates will ultimately qualify as meeting the state’s goal. But two things are certain: 100 percent will not mean 100 percent and “No Child” clearly will not mean no child.
Derek Redelman is director of education policy for the Indianapolis-based Hudson Institute, where he also is a senior fellow. His email address is [email protected].
For more information …
Further explanation of statistics terminology, and a listing of books that provide a layman’s introduction to statistics, are available at Robert Niles’ Web page, under “Journalism Help: Statistics Every Writer Should Know,” at http://nilesonline.com/stats.