Ed departments need to look before leaping — The Bluegrass Institute for Public Policy Solutions

A couple of days ago the Kentucky Department of Education sent out a glowing press release, “Report Ranks Kentucky in Top Ten for Core Academic Improvement.”

The release crowed about how well Kentucky looked in a new report from Harvard, “Achievement Growth: International and U.S. State Trends in Student Performance.”

I’m told the Maryland department was cheering about that state’s supposed performance, as well. Certainly, the Maryland story was picked up by the Washington Post.

In both cases, the claims were based on the Harvard team’s review of score improvements in reading, mathematics and science from the National Assessment of Educational Progress (NAEP).

Well, I know Kentucky’s NAEP performance isn’t quite so Sterling; so, I took a look at the Harvard report to see how the researchers developed their findings. What I found was no surprise.

As usual when we hear these sorts of amazing claims about Kentucky’s public education system, the Harvard report only examines overall average student scores from the NAEP. The report thus ignores cautions found in all recent NAEP Report Cards that when data is compared between states, you have to consider the racial makeup of the school systems in those states and make note of unusual rates of exclusion of students from taking the NAEP, too.

The Harvard report uses some rather mysterious calculations (more on that later, if you click the “Read More” link below) to show Kentucky’s overall academic growth on NAEP between 1992 and 2011 across the three subjects of reading, mathematics and science ranked at number 7 among 41 states with NAEP data (See Table B.2 in the report)(rank updated 28 Jul 12 to correct for an error in Kentucky Department of Education's press release, which claimed Kentucky ranked 5th).

But, the report’s methodology, which, I reiterate, ignores warnings in the NAEP Report Cards themselves, overlooks some very important facts. And, the report creates an overly rosy picture for Kentucky, as well.

For example, as the table below shows, Kentucky’s black fourth grade students certainly didn’t share in all this wonderful success, at least not in math.

Black State G4 Math Improvement on NAEP '92 to '11

Black State G4 Math Improvement on NAEP '92 to '11 Notes

To read this table, notice that the 2011 scores appear first, followed by the 1992 scores and the difference in those scores (change). Where other states have the same change as Kentucky, the score change and the ranks are highlighted in yellow to signify a tie. (Note added 25 Jul 12 - This table ignores the statistical sampling error in the NAEP scores, which would further 'fuzzy up' the results, creating more ties for Kentucky but less differentiation between the states, as well.)

Among the 34 states and the District of Columbia that had reported grade four math scores for blacks in both 1992 and 2011, Kentucky wound up ranking way down near the bottom of the listing for progress in a four-way tie for 27th place. Only five states had a slower rate of progress for black students on NAEP Grade 8 Math than Kentucky did.

The situation is nearly the same for Kentucky’s eighth grade blacks in math, by the way. Kentucky ties for only 20th place out of 33 states with useable NAEP data for blacks.

The Harvard report’s authors claim that in most states that raised average scores, scores also went up for both top achieving students and low-achievers, too. In other words, as the report says on Page 23, “In most states, a rising tide lifted all boats.”

Well, Kentucky’s blacks clearly missed that boat. But, you have to do more than limited research to learn that. You have to look beyond the “all students” NAEP scores before you can see that.

And, even if Kentucky’s whites did post somewhat better improvements (though definitely not in even the top 10 for math, let alone the top 5), the grim reality, as the map below shows, is that as of the 2011 NAEP Grade 8 Mathematics Assessment, Kentucky’s whites still scored very close to the bottom of the heap even after making that improvement.

The new Harvard report claims in places like Figure 2 to have examined state improvements on the NAEP in reading, math and science between 1992 and 2011. Elsewhere the report indicates the scores examined are for both fourth and eighth grade testing.

I don’t think so.

As this table (Developed from the on line NAEP Schedule) shows, the NAEP did not consistently conduct tests for these three subjects across all the grades and years indicated. In this table, a NAEP State Assessment in a given subject for a given grade only was conducted where the number “1” appears.

Notice that state level NAEP scores for eighth grade reading and for both grades in science don’t exist for 1992. The only subjects and grades the State NAEP tested in 1992 were fourth grade reading and math and eighth grade math.

Testing in the other subjects and grades didn’t start until later, sometimes MUCH later. For example, fourth grade science was not assessed for the first time until 2000. That leads to another problem in the report.

The Science Problem – Terribly limited trend lines defy longitudinal analysis

The first State NAEP in eighth grade science wasn’t conducted until 1996. Fourth grade State NAEP in science started even later in 2000, as already mentioned.

NAEP Science for both grades was not tested again until 2005.Then, after the 2005 science NAEP scores were issued, another problem for longitudinal analysis of NAEP results cropped up. The NAEP Science Framework got a major change before the next NAEP Science tests were given in 2009. The types of questions and scope of material covered got changed significantly.

Because of that major change, the NAEP 2009 Science Report Card says on Page 4:

“Because the 2009 assessment is based on a new framework, these results cannot be compared to those from previous assessments but instead will provide a baseline for measuring students’ progress on future NAEP science assessments.”

That’s pretty clear. The NAEP Science trend lines have been severed. You cannot conduct a meaningful long-term analysis of the NAEP Science results.

By the way, if you go into the NAEP Data Explorer, it actually prevents you from simultaneously accessing science scores from 2009 or later along with earlier results. You have to separately access the scores according to the Framework that was in use for the year of the assessment.

So, the NAEP folks really mean it. They don’t want you to try to analyze longitudinal changes in State NAEP Science performance across the 2009 Framework change.

The Harvard folks report doesn’t tell us what they did for their science analysis. Did they ignore this direction from the NAEP itself? Did they only use the limited (especially for fourth grade) data from the old Framework tests?

There is still more on science. After 2009, NAEP Science was again administered in 2011, but only to the eighth grade. Thus, there is absolutely NO longitudinal data currently available for fourth graders in State NAEP Science since the framework was changed. And, the eighth grade science trend from the new Framework series is extremely thin, including just two years of testing, only two years apart.

As far as the old Framework NAEP goes, the only possible state trends in science that can be reasonably generated are for eighth grade from 1996 to 2005 (with three test administrations) and for the fourth grade from 2000 to 2005, which includes only two testing years of data. This data is also now rather dated, as well.

Bottom line on NAEP Science: thanks to the Framework change, the trend lines for science were completely severed in 2009, so trying to do longitudinal analysis with NAEP Science is highly problematic. Sure, you can mix and match scores from the different Frameworks as a math exercise, but the NAEP folks themselves clearly do not want people to do that because the two test Frameworks are apparently very different and the results are not comparable.

In the end, I dropped my plan to examine the NAEP Science trends. Suitable data simply does not exist, not for an “all students” approach, nor for a breakout by race. Perhaps the Harvard folks know something I and the people who actually run the NAEP do not know.

The Reading Problem – Same old story – Researchers ignore Kentucky’s nation-leading exclusion

The Harvard folks totally ignore Kentucky’s nation-leading rates of exclusion of students with learning disabilities on the latest NAEP reading assessments in both grades four and eight. That inflates the state’s scores, though there is currently no consensus on how much or how to correct for the problem.

Still, as recent NAEP 2009 Reading Report Card says on Page 6

“Variations in exclusion and accommodation rates, due to differences in state policies and practices for identifying and including SD and ELL students, should be considered when comparing students’ performance over time and across states.”

The Harvard study didn’t do that.

Also, the State NAEP in Grade 8 Reading didn’t start until 1998. Implying there are data for earlier years is misleading, and I have no idea how the researchers incorporated the eighth grade trend line from 1998 to 2011 with the available 1992 to 2011 trend line for the fourth grade.

You Cannot Ignore Racial Demographics in Ranking States

In 2011 the racial achievement gaps on the NAEP continued to be very strong. For example, nationwide in that year whites scored 293 on the NAEP Grade 8 Mathematics Assessment, while blacks scored over 30 points lower with a 262. Across individual states, similar gaps are found for all the racial groups.

So, if you have a state like Kentucky, which is very ‘white,’ (84 percent white in 2011 NAEP testing), that state has a huge advantage over a state like California where whites make up only about 25 percent of the public school enrollment (state NAEP only reports public school scores).As soon as you start to analyze NAEP on a race by race basis, however, the picture starts to change. Note in the table at the beginning of this blog that California ranks at the very top for it’s fourth grade NAEP math improvement.

Because it only looks at overall average student scores, the Harvard study falls right into the NAEP demographic trap. Some remedial reading on the Simpson’s Paradox is highly in order.

Learning about NAEP’s pitfalls

None of the problems highlighted above are news to us. In fact, the Bluegrass Institute has had a great article about The National Assessment of Educational Progress, including a section discussing these issues, on line for several years. May I suggest adding this to the required reading lists at Harvard and Stanford and some state education departments? It would solve a lot of problems in future research efforts using the NAEP.