What qualities REALLY count in education? Part 3

On problems with education research, state education rankings and teacher preparation

I have written several blogs over the past few days about an annual rite of passage in the education world. On January 12, 2012 Education Week released its annual report on education across the United States, known as Quality Counts (subscription).

Kentucky’s education boosters jumped on the new state rankings in the report, which showed that Kentucky moved up 20 places in the Quality Counts state rankings in just one year (Really???).

It’s hard to imagine that so much celebration when Kentucky got an unimpressive “C+” score overall in Quality Counts, but the jump in the rankings does sound impressive (assuming you believe a state can change its education system that much so quickly), until you ask some very basic questions:

What qualities really count in education, and does Quality Counts do a good job of identifying and grading them? For that matter, do many involved with education really know the answers about what REALLY makes up a quality education system?

In my first two blogs (here and here), I talked about Quality Counts using a less accurate formula for graduation rates and how that resulted in Kentucky looking better than it does with a more credible graduation rate calculation.

In the most recent blog, I discussed some disturbing evidence that Quality Counts’ very high ranking for “The Teaching Profession” in Kentucky are sharply at odds with brand new rankings from the National Council on Teacher Quality.

Now, let’s discuss all the ways Quality Counts gets in trouble with its extensive but simplistic use of National Assessment of Educational Progress (NAEP) scores.

Very briefly, Quality Counts does just about everything wrong in its state-to-state NAEP comparisons.

• Quality Counts totally ignores Kentucky’s nation-leading exclusion of students with learning disabilities from the NAEP reading assessments. They never even mention it.

• Quality Counts only examines potentially very misleading overall “all student” NAEP scores and never provides a clue that things might, and do, look very different when those scores are disaggregated by race.

• Quality Counts ignores the fact that all NAEP scores are from a statistically sampled test and are only estimates with a lot of sampling error. Thus, it is possible for one state to somewhat outscore another when in reality the second state performs the same as, or even somewhat better than, the first. Instead, Quality Counts simplistically ranks scores listed to the tenth of a point as though such small score differences are meaningful.

All of these NAEP issues are no surprise to our regular readers. The surprise is that a normally very quality news source would be enmeshed in them.

Once again, I want to start by making it clear that I find Education Week’s newspaper and supporting blog efforts to be very valuable. I have communicated with a number of EdWeek reporters over the years, and I have found them very knowledgeable of education in general and with their specific fields of emphasis. They communicate their knowledge with considerable skill.

Unfortunately, Quality Counts isn’t created by line reporters at Education Week (they do write some of the interesting articles, however). Here is more on how Quality Counts didn’t do a quality analysis of the NAEP.

As earlier mentioned, Quality Counts fails to evaluate the NAEP in at least three important areas:

1) There isn’t even a mention of the potential impact on scores caused by very different exclusion rates from state to state for English language learners and students with learning disabilities.

2) Wildly different racial demographics now found in the various states, which seriously complicate state to state NAEP analysis, are also ignored.

3) NAEP’s statistical sampling errors are also completely ignored.

As things turn out, each of these issues creates a huge advantage for Kentucky in the Quality Counts’ overly simplistic analysis, seriously undermining the credibility of the report.

Without doubt, the NAEP is important. Quality Counts recognizes that by counting the NAEP results in numerous sections of its calculations, giving these test results a lot of overall weight in the final results.

The first place where NAEP plays a role is in Quality Counts’ “Chance for Success” part of the rankings.

Here Quality Counts includes factors based on NAEP proficiency rates in fourth grade reading and eighth grade mathematics.

The NAEP receives especially heavily weight in the “K-12 Achievement” section.

There are separate items for proficiency rate rankings for fourth grade reading, eighth grade reading, fourth grade math and eighth grade math.

But (at the risk of sounding like a TV ad), there is more. There are four more “K-12 Achievement” calculations for the change in NAEP math scale scores between 2003 and 2011 for both fourth and eighth grade and similar calculations for both grades in NAEP reading.

Then, two more NAEP score-dependent calculations are included, one for the difference in fourth grade reading and another for the difference in eighth grade math scores between students who qualify for the federal school lunch program (a poverty indicator) and those that do not qualify.

We are still not done, though.

There are two more NAEP based items under the “K-12 Achievement” section. One is for the percent of students in each state that scores at NAEP’s highest level, “Advanced.” The other is for the change in this statistic between 2003 and 2011.To sum up, NAEP is extensively used for the Quality Counts rating. However, with the exception of the school lunch items, all are based solely on overall “all student” scores. None of the scores are adjusted in any way for different state demographics, and mum’s the word about possible impacts from highly variable exclusion rates for some student groups.

How does this benefit Kentucky unfairly?

1) Kentucky led the nation for exclusion of learning disabled students in NAEP reading in both fourth and eighth grade. We excluded a whopping eight percent of the entire raw sample of all fourth grade students NAEP wanted to test for reading as supposedly too learning disabled. We also led for exclusion of students from the eighth grade reading assessments. These students would score very low, as a group, if they took NAEP. By excluding so many of them, Kentucky’s reading scores are inflated. I know it, Kentucky Commissioner of Education Terry Holliday knows it, the Florida Commissioner of Education knows it (and has filed a complaint about it), and so do others.

In fact, Education Week’s Quality Counts staff does EXACTLY what Kentucky’s Prichard Committee does, which NAEP expert Bert D. Stoneberg, NAEP State Coordinator for the Idaho Department of Education, says should not be done.

2) Kentucky’s school system is one of the ‘whitest’ in the country. At a time when whites now form a small minority in states like California and New Mexico, Kentucky’s public schools remain 84 percent white in the new NAEP results. Because whites score much higher than the minorities, states that have a lot of minority students are at a huge disadvantage in any NAEP comparison that unfairly looks only at “all student” NAEP results. YOU HAVE TO DISAGGREGATE NAEP DATA BY RACE TO GET A REAL IDEA ABOUT HOW STATE EDUCATION SYSTEMS ARE PERFORMING.

The really sad thing is that the NAEP Data Explorer on line tool now allows anyone to easily disaggregate data by race. There really is no good excuse for Quality Counts not doing this.

3) You cannot ignore the statistical sampling errors in NAEP results. Quality Counts commits this error too. The NAEP Data Explorer offers straightforward ways to deal with this issue, but that so far has escaped Quality Counts.

How to fix this – Quality Counts staffers can start by reading The National Assessment of Educational Progress. This easy to understand resource shows how failure to account for the issues discussed above can lead to very serious misconceptions about state to state performance on the NAEP. It includes examples of how better rankings that allow for demographics and sampling error are not difficult to develop.