When a sports player wants to work on improving his or her game, it’s not very productive to spend time simply looking at the scores of recent games.
The first step in the process usually involves poring over detailed game statistics and repeatedly playing videotapes of those recent games to identify strengths and weaknesses. Once the weaknesses have been identified, the player can develop a strategy for improvement.
In education, the diagnostic tools available for improving teacher performance have, until recently, been limited largely to the use of raw student test scores. William L. Sanders, an innovative researcher at the University of Tennessee, changed all that during the past decade with the development of an analytical procedure that enables teachers to see where their teaching is effective and where it’s ineffective, thus providing a solid starting point for improvement.
Since 1992, Sanders’ Value-Added Assessment System has been the guiding force behind Tennessee’s school improvement efforts, no mean achievement for a research-oriented statistician working outside his field.
With a doctorate in biostatistics and quantitative genetics, Sanders worked at the Oak Ridge National Laboratory before taking over a statistical analysis center for agricultural research at the University of Tennessee. Now, instead of conducting classes on statistical mixed models, Sanders spends much of his time explaining his value-added technique to interested lawmakers across the country. He recently spoke with School Reform News Managing Editor George Clowes.
Clowes: How did you get involved in value-added analysis in education?
Sanders: It got started in the late 1970s, when governor Lamar Alexander was advocating a merit pay plan for Tennessee teachers. A big issue at that time was, and still is, how are you going to measure teacher effectiveness?
I got involved when our legislators were told that you could not use student achievement data to measure teacher effectiveness. I said that you could, and got access to Knox County Student Achievement Test data to prove it.
By the summer of 1982, I had completed my analysis. We made a lot of presentations, but nothing happened until 1989, when our legislature and Governor McWherter were looking for an accountability system for public education. They heard about our work and we presented it to them in early 1990.
When the Educational Improvement Act of 1992 was enacted, our methodology became the cornerstone of the state’s accountability system, the Tennessee Value-Added Assessment System. I then put together a small team at the university to build a software system to apply the methodology on a statewide basis.
The first value-added reports were released for the district level in 1993, for each school level in 1994, for the teacher level in 1996, and for all levels annually since then. The district- and building-level information is required by statute to be publicly released. However, the teacher-level information is made available only to the teacher, the school board, and usually the principal.
Clowes: Have the reports led to teachers being fired or changing the way they teach?
Sanders: To my knowledge, no teacher has been fired where the reports were the justification for it. This is not about firing people. This is about measurement, about producing what I call the river of diagnostic information to show individual teachers where their relative strengths and weaknesses are.
The way that this information has been used in this state varies enormously from school district to school district. In those districts where local leadership put in the time and effort to assist principals and teachers to learn how to use this information diagnostically, there is definitely measurable progress. We’ve got other districts that have totally and completely ignored it. But when you plot the trend lines of student populations from different districts, you can see which of them have made very positive progress and which haven’t.
Let me give you two teacher reactions to the reports.
Last night, I talked to an eighth-grade teacher who thinks that value-added assessment is the worst thing that ever happened to his profession. He said, “I have often literally seen teachers open their value-added report and cry.” I’m sure that’s true. I am not in the business of winning a popularity contest.
What he was saying was that we must quit doing it because of the emotional harm to teachers. But he’s focusing on the teacher, not on the child. When we have a mountain of accumulating evidence about the effect of teacher sequence on a child’s academic achievement level, it’s just flat wrong to sweep it under the rug.
A lot of the people who are relatively ineffective teachers are sincere, conscientious, dedicated human beings who often don’t know that they’re relatively ineffective. They don’t know why they’re ineffective; they don’t know where they’re ineffective. The measurement of their effectiveness is the key, and we have a measure that is objective, repeatable, and reliable. The idea is to get teachers to confront that measure of their effectiveness and to respond positively to it. That’s where my second example comes in.
I had a fifth-grade math teacher come to see me in early June to show me her first-year teacher report. “I was devastated when I got that report,” she told me. “I’d cry awhile, and then I’d cuss you awhile, back and forth.”
I looked at her report and I said, “Golly, it’s not bad. It showed that you’re pretty much an average teacher.”
“That’s what was devastating about it,” she said. “I’d always thought of myself as being a superior teacher.”
Her supervisor told her to ignore the report because “Everybody knows that you’re a good teacher.” But when she mentioned it to one of her class aides, the aide commented, “Well, I think the students understand things the day you go over it, but I don’t believe they’re retaining it.” Then, the teacher said, “All the lights in the room went on for me.”
What she came to show me was the system that she had developed to track the progress of every one of her students through the 150+ domains that she covers in the school year, and how she loops back a month later to make sure they’re still retaining it. What’s happened is that her value-added gains have more than doubled.
She now has other teachers in her building doing the same thing, and their gains have gone up dramatically, too. This is a teacher who evolved her own teaching process because she was devastated by the value-added report. Now she’s one of the most zealous missionaries for value-added assessment in this state.
Clowes: How does your analytical procedure gauge a teacher’s effectiveness?
Sanders: First, imagine a child’s physical growth curve. Often, parents can go somewhere in their house and there will be marks on the wall where the little girl was so tall when she was two, and four, and five years old. We could plot that on a piece of graph paper and get some measure of that child’s physical growth. Now, there could be all kinds of errors in those measurements and so it’s not a smooth, elegant line, but it still gives us a measure of physical growth.
Now, instead of height, let’s plot the child’s math scores over time. That’s not necessarily a smooth, elegant line, either. But if we were to see a flat spot in the child’s math curve between third and fourth grade, there’s nothing that we could accurately conclude about the effectiveness of the child’s fourth-grade teacher.
On the other hand, if we look not at one child’s graph, but at a whole classroom of children for three years with that teacher–if we see that most children have flat spots between third and fourth grade, that is huge, powerful evidence that something is not going on instructionally for those children with that teacher. That is the conceptual essence of what we’re doing.
Clowes: So this allows you to draw some conclusions about the relative merits of teachers within a district?
Sanders: Absolutely. If you were to see that the district average has, say 60 scale points of growth in children, and this particular teacher’s classroom achieves only 40 consistently, then that is strong, powerful evidence that that teacher is not being nearly as effective as the district average. But you can even go further than that–you can go down to that particular teacher’s classroom and plot a simple graph of this year’s gain for each child versus the child’s previous achievement. By looking at the pattern on that graph, you can see which students are not making progress.
The most frequently observed pattern is a downward sloping line from left to right, which I’ve labeled the “shed” pattern. That pattern says that that teacher is allowing the previously lower-scoring students to make more gain than the average or above-average students in that classroom. When you see that pattern, it’s a dead cinch giveaway that that teacher is pacing everything to a few previously low-scoring students in the classroom and holding the other students back.
Another pattern is what I call the “tepee” pattern, where the students in the middle are making more gain than the students on either side. That’s where the teacher is saying, “I’m teaching fourth grade. I’m teaching one lesson. It’s over some students and under some others.”
What we have found is that these gain patterns are the most revealing and helpful tools to teachers and principals, because often they haven’t even realized what they were doing. It doesn’t tell teachers what to do, but it certainly points out those regions that need attention.
The metric I’ve created is called percent cumulative norm gain. You want every school to be at least 100 percent. The schools in our state will range from 150 percent down to 60. If a school is at 60 percent, that means the average child in that school is making only 60 percent of the amount necessary to make one year’s worth of growth per year at school. A 60 percent school is an awful school. But I can show you schools that would be 120 and others that would be 70–both drawing children from the same low-income neighborhood.
Clowes: Is there any reason why students in schools with high concentrations of poverty should learn any less than students in an affluent district?
Sanders: Interestingly, I’ve caught the most political heat from some of the schools in affluent areas, where we’ve exposed what I call “slide and glide.” One of the top-dollar districts in the state had always bragged about its test scores, but our measurements showed that their average second-grader was in the 72nd percentile. By the time those children were sixth-graders, they were in the 44th percentile. Under our value-added scheme, the district was profiled in the bottom 10 percent of districts in state. They were not happy. You’d think I had nuked the place.
With our value-added approach, we can demonstrate that our measure of school effectiveness is totally unrelated to traditional socioeconomic indicators. We have more than 1,300 elementary schools in this state; their effectiveness is totally unrelated to the racial composition of the school or the percentage of children in the federal free and reduced-price lunch program. That’s looking at measures of progress, not at raw test scores.
You shouldn’t hold teachers and principals of school districts accountable for things over which they have no control. You should hold them accountable for those things they do have control over. Schools and teachers don’t have control over the achievement level when children walk in the door, but they do have control over how much that level is raised during the year.
If that is sustained over time, it becomes like compound interest, and what you see is populations of children constantly rising to higher and higher levels of achievement in later grades, regardless of where they started.
Clowes: What do legislatures need to do to put your system in place?
Sanders: Several states are discussing it. The state of Florida has enacted legislation, as I understand it, to move to a value-added or “gain” model in about 2001. They also enacted legislation to start testing each child each year. Arizona is looking very seriously at testing each child each year in math and reading. Delaware is considering it, and Mississippi.
Every child needs to be tested every year. If districts have been testing every year in one form or another, those results can be used to start getting diagnostic information before the state has completed testing with its annual regime. A state should pick four or five districts as pilots to work out a lot of the wrinkles while bringing on the statewide testing system. I recommend that none of the results be used as part of a formal evaluation until you get at least three years of data for each teacher.
One problem I see is that some folks are talking about doing some rather simplistic things with value-added models. My approach is often criticized because of its complexity, but it’s complex for a very important reason. It’s complex because we’re trying to make it close to mathematically impossible for a teacher or school to get a false negative due to some quirk in the data. The complexity is there to protect against spurious results. That takes big computers and lots of sophistication. You can’t do it on a shoestring, and you can’t do it with simplistic approaches.
Clowes: What does it all cost?
Sanders: You can put all of the testing and analysis in place for less than one-half of 1 percent of annual per-pupil expenditures. If a state is spending $5,000 per child annually, one half of 1 percent is $25. If you gave me a contract to provide your testing and analysis for $25 per child, I’d take it in a heartbeat–and I’d make out like a bandit. If teachers and principals are using the results to drive academic achievement over time, that’s a trivial expense.
For more information …
A report by William Sanders on the effect of teachers on student achievement is available through PolicyBot, “Cumulative and Residual Effects of Teachers on Future Student Academic Achievement” http://www.heartland.org/policybot/results/3048
(1996, 14 pp.).
Also available through PolicyBot is a report by J.E. Stone, “What Is Value-Added Assessment and Why Do We Need It,” published by The Foundation Endowment in 1999 http://www.heartland.org/policybot/results/6822 (13pp.).