Rankings of Kentucky’s educational performance still flawed – Déjà vu
Prichard Committee and KY Chamber get their rankings wrong
Back in January, I wrote a fairly extensive blog about the poor ranking processes in two then new reports from Education Week’s Quality Counts series and from the UK Center for Business and Economic Research (CBER).Just as in many years past, these rankings are clearly problematic.
But, that has not stopped a new report released a few weeks ago by the Prichard Committee for Academic Excellence and the Kentucky Chamber of Commerce from citing both the Quality Counts and CBER material as though it was high grade “stuff.”
That’s pretty disappointing, because we are not going to fix problems with Kentucky’s public education system if we don’t even establish a solid idea of how that system currently performs.
So, if you are interested in “The Rest of the Kentucky Education performance story”, as the late Paul Harvey would have so nicely put it, click the “Read more” link to get a more complete guide to Kentucky education.
Before discussing specific problems, let’s review some general concerns about ranking state education systems. This will be old news for our regular readers but these comments need to be repeated again since folks at EdWeek, the CBER, Prichard and the Chamber apparently still don’t get it.
A major problem with ranking state education systems is that the student demographics in each state vary dramatically. For example, as shown in Table 1, in the 2015 National Assessment of Educational Progress (NAEP) Grade 4 reading assessment, Kentucky’s student population was dramatically different from the US average. Kentucky was 79 percent white while across the nation public school enrollment in 2015 was only 49 percent white. Black enrollment in Kentucky was only 10 percent, while blacks comprised 15 percent of the total nationwide. Hispanics only made up 5 percent of Kentucky’s student body, but nationally they made up 26 percent of the public school enrollment. For a bit more reference, whites in California comprised only 25 percent of the state’s public school enrollment in late winter of 2015 when the NAEP was administered.
TABLE 1
Coupled with these very dramatic differences in demographics are tremendous differences in test results for the different races, as shown in Table 2.
TABLE 2
These major achievement gap scores create a major comparison problem when the demographics differ sharply. Given Kentucky’s overwhelming white enrollment, even though the Bluegrass State’s whites scored lower than either whites in California or across the nation, when the scores are averaged together using the enrollment percentages, Kentucky gets a big advantage.
This isn’t news, by the way. NAEP Report Cards since 2005 have discussed issues that should be considered when comparing scores across states. These include things like differing student demographics, differing rates of exclusion of students from testing, and the fact that the NAEP is a sampled assessment so all the scores have sampling errors and small differences in scores are not significant.
Of particular note, a special section of the NAEP 2009 Science Report Card that begins on Page 32 is titled, “A Closer Look at State Demographics and Performance.” It actually uses Kentucky as an example of how impressions about a state’s performance can change notably once the scores are disaggregated by race. Figure 32 from that NAEP report shows that overall Kentucky outscored the national average by a statistically significant amount. However, when the scores are considered by race, Kentucky’s whites, who comprised more than 80 percent of the state’s public school enrollment in 2009, actually scored statistically significantly lower than the national average for whites.
Let’s explore this with more recent data found in Tables 1 and 2 above. If each state’s weighted average score on the NAEP 2015 Grade 4 Reading Assessment is computed for the three racial groups shown in Tables 1 and 2, Kentucky’s fourth grade reading scores are notably higher than either the US average or California’s, as Table 3 shows. The difference is probably large enough to be statistically significant.
TABLE 3
However, look at what happens if we use each state’s scores for whites, blacks and Hispanics but we weight those scores by the demographic percentages found in Kentucky. Table 4 shows the results.
TABLE 4
Wow! Kentucky isn’t well ahead of the US average or California at all. The only reason Kentucky looks better in overall score comparisons is because of an unfair advantage due to our state’s very different student demographics. Keep in mind that NAEP Grade 4 Reading is where Kentucky shows its best performance. Things look a lot worse for Kentucky when we consider math, which I will discuss a little later.By the way, for technical types, the well-understood mathematical fact of life outlined above even has a name: “Simpson’s Paradox.” Simpson’s tells us that only examining overall average scores can hide some really interesting surprises that only become apparent once the different subgroups that go into the average are separately considered.
Both Quality Counts and the CBER report fall right into the Simpson’s Paradox trap. And, neither Prichard nor the Chamber apparently knows that.
The Quality Counts and CBER reports use overall “all student” scores from the NAEP carried out to a ridiculously fine level of one tenth of a point (Remember, the NAEP scores have sampling errors and are not nearly that accurate). As a result, both reports portray biased images of winners and losers in state-to-state education performance that just are not correct.
Here are a few specific comments about each of the reports:
CBER’s “Kentucky’s Educational Performance & Points of Leverage,” Issue Brief 19, January 2016
The vast majority of the report is built around NAEP scores for “all students.” There is no correction for the Simpson’s Paradox issue discussed above, which biases all of the test score comparisons in Kentucky’s favor.
Carrying out NAEP analysis to the nearest tenth of a point is dubious. NAEP just isn’t that precise.
The NAEP science scores used by the CBER are now years out of date. Including them in a 2015 analysis further inflates the findings in Kentucky’s favor.
Small technical issue: the CBER report says “The NAEP data reflect the percentage of public students scoring proficient or higher, and the U.S. data represents the National Public.” In fact, a spot check of the data found in the NAEP Data Explorer web tool indicates the US numbers for both fourth and eighth grade science are for the nation and include non-public school results.
Comparison of the high school graduation rate for Kentucky to other states is dubious because the criteria for diploma awards are not standard from state to state. As I have explained elsewhere, there is good evidence that Kentucky is doing a lot of social promotion to a diploma, which does not indicate readiness for either college or career. The apparent social promotion in Kentucky inflates the real graduation rate picture. I suspect many other states have similar issues, but I don’t think there is a way to consistently compute social promotion amounts in other states.
Further evidence of the social promotion problem comes from what are labeled the “ACT % College/Career Ready (2015)” numbers in the CBER report. Note that Kentucky might graduate more students, but the percentage ready for college and careers is notably lower than the national average. On a technical note, it appears these numbers are actually reported by the ACT, Inc. as “Students Who Met All 4 ACT Benchmark Scores” for college readiness as shown in Figure 1.1 in that organization’s “ACT Profile Report” for Kentucky for 2015". This is not the same as Kentucky’s “College and Career Ready” statistic. Most importantly, there is controversy about whether the ACT is a solid indicator of career, as opposed to college, readiness. CBER should not change labels from ACT reports without explanation.
The CBER ranks ACT Composite Scores for all 50 states. That is not valid – even the ACT, Inc. recommends against doing this because the proportions of graduates taking the ACT in each state vary dramatically and are not valid random samples. In 2015 only 10 percent of Maine’s graduates took the ACT while in Kentucky and 12 other states all graduates were tested. Kentucky’s results can be compared to the other 12 states that tested all students, but comparisons to the other 37 states are statistically invalid.
Education Week’s Quality Counts 2016
It is remarkable how Kentucky’s ranking in Quality Counts has bounced around dramatically in just a few years. When Quality Counts ranked states in the 2013 report, somehow Education Week’s report team convinced itself that Kentucky’s education system ranked 10th in the nation. Just one year later, in 2014, Education Week listed Kentucky in 35th place! In 2015 Kentucky placed 29th (though the map has an erroneous 2013 annotation on it). Note this is not the same map as the 2013 map linked from here. In 2016 Kentucky supposedly improved to 27th place. That’s quite a lot of jumping around, from 10th to 35th to 27th place between 2013 and 2016. This mostly demonstrates that Quality Counts’ ranking schemes are highly unstable from year to year and trends should be regarded as dubious, at best.
Just like the CBER report, Quality Counts develops its ranking in part based on “all student” NAEP scores, ensnaring this report in the Simpson’s Paradox trap.
As with the CBER report, Quality Counts also makes too much out of very small, statistically insignificant NAEP score differences, reporting scores that actually have sampling errors of several points as though they are meaningful to the nearest tenth of a point.
Also as with the CBER report, Quality Counts includes high school graduation rates in its rankings, which is clearly problematic for reasons previously mentioned.
One thing is certain: reports that don’t consider things like differing student demographics are not going to provide accurate pictures of true relative performance of state education systems. That’s just the way things are, but it seems a lot of people issuing “research” in this area simply don’t know, or don’t want to admit, that.
So, to close, here is a ranking example that does allow both for the sampling errors and the demographic issues in NAEP scores. Our regular readers have seen the next two figures before, but they bear repeating. It is hard to understand how Kentucky can score even in 35th place when its predominant student ethnic population, its white students, does so poorly in NAEP eighth grade math.
Figure 1 shows how Kentucky stacked up in the NAEP 2015 Grade 8 Math assessment. As you can see, the Bluegrass State’s white students were bested by whites in 42 other states plus Washington, DC schools. Kentucky’s whites only did statistically significantly better than whites in two other states. Again keep in mind that about 80 percent of Kentucky’s total public school enrollment is white.
FIGURE 1
If you are interested in trends, here is how things looked back in 2011.
FIGURE 2
Yes, that is right. In 2011 our whites outscored whites in three other states and were outscored by whites in 39 other states plus Washington, DC.
So, between 2011 and 2015, the number of states where whites outscored Kentucky’s whites rose from 39 plus Washington, DC to 42 plus Washington, DC. In turn, Kentucky outscored one fewer state in 2015 than in 2011.Here is one last shocker for you.
Way back in 1992, Kentucky’s NAEP Grade 4 Reading proficiency rate for “all students” was 23 percent while the nationwide average was 27 percent. BUT, don’t forget those statistical sampling errors! When you do a statistical significance test for these results with the NAEP Data Explorer, it turns out that these results are not statistically significantly different. So, if we are using the “all student” scores that the CBER and Quality Counts want us to use, it turns out Kentucky’s reading performance way back in the early days of KERA was not significantly different from the national average. Given that, Kentucky hasn’t made much improvement since.
Tech NotesData Source for NAEP scores in Tables 1 through 4 and production of Figures 1 and 2:
NAEP Data Explorer online tool
Updated July 2, 2016 to make minor edits and add bullet about CBER's invalid ranking of ACT scores.