What you need to consider when comparing state education systems

As I wrote yesterday, we are seeing all sorts of reports computing rankings of state education systems. However, those reports are using approaches that don’t comply with recommendations found in various documents from the National Center for Education Statistics (NCES).

Yesterday’s blog discussed two of those poorly conducted reports, the Quality Counts 2016 report from Education Week and the University of Kentucky’s Center for Business and Economic Research’s Issue Brief #19. Last night I came across another example, a blog from the Prichard Committee for Academic Excellence that makes exactly the same sort of comparison mistakes.

All of these reports blatantly ignore guidance from the NCES regarding pitfalls to avoid when doing comparisons of test results from different states and as a consequence wind up portraying an inaccurate picture of Kentucky’s true performance. To see what should be considered when conducting state education system rankings – things often being ignored by those doing state rankings – click the “Read more” link.

Let’s see what NCES says needs to be considered when you compare state education systems.

Consider different student demographics

Over the years, the NCES has made many comments about the impact of different student demographics on differences in National Assessment of Educational Progress (NAEP) scores from state to state. This isn’t a new issue, and by now reports that compare education results from state to state should honor this important factor.

For example, The 2005 NAEP Math Report Card has some comments about different student demographics can impact overall scores in NAEP saying:

“In comparing states to one another, it is important to consider that overall averages do not take into account the different demographics of the states’ student populations.” NAEP 2005 Mathematics Report Card, Page 14.

The 2007 NAEP Math Report Card added to this, saying:

“Changes in performance results over time may reflect not only changes in students’ knowledge and skills but also other factors, such as changes in student demographics….” NAEP 2007 Math Report Card, Page 7.

NCES said still more in the 2011 NAEP Math Report Card:

“The performance of students in individual states should be interpreted in the context of differences in their demographic makeup. For example, the proportions of students from different racial/ethnic groups reported in NAEP varied widely across states in 2011." NAEP 2011 Math Report Card, Page 25.

A special section in the NAEP 2009 Science Report Card, provided a specific example from Kentucky related to this important topic. This special section first says:

“It is helpful to examine the differences between how a state performs overall and how students within a demographic group in that state perform. Some might assume that states that score above the national average would have student groups that exhibit similar performance, but that is not necessarily true.” NAEP 2009 Science Report Card, Page 32

This is where the discussion of a specific example regarding how considering demographics impacts results from Kentucky can be found. That discussion is based on data in Figure 32 from the report card (reproduced below), which I also showed you in yesterday’s blog.

KY White Science Results from 2009 NAEP Report Card Fig 32

KY White Science Results from 2009 NAEP Report Card Fig 32

Notice how the relative impression of Kentucky performing better than the national average changes when we consider how its white students compare to whites across the nation.

So, here is the bottom line: reports that only look at overall "all student" scores from the NAEP or any other education assessment can produce seriously misleading impressions about the true relative performance of the education systems in those states. NCES has made this clear over the years, and examinations of the actual data show it is very true for Kentucky because the state has maintained a far more stable population, predominantly white and English speaking, while across the nation the demographics have shifted to much larger proportions made up of minority students. Because, as we discussed in yesterday’s blog, minorities score very significantly lower on assessments like the NAEP, Kentucky’s large white population gives it an unfair advantage in any simplistic comparisons of “all student” test scores.

Consider exclusion and accommodation rates issues (Note – SD are students with learning disabilities and ELL are English language learner students):

“Variations in exclusion and accommodation rates, due to differences in policies and practices for identifying and including SD and ELL students, should be considered when comparing student performance across states. States and jurisdictions also vary in their proportions of special-needs students, particularly ELL students. While the effect of exclusion is not precisely known, comparisons of performance results could be affected if exclusion rates are markedly different among states.” NAEP 2009 Science Report Card, Page 6Kentucky used to have one of the highest exclusion rates of learning disabled students of any state in the nation, especially on past NAEP reading assessments. This problem first surfaced in the 1998 NAEP Grade 4 Reading Assessment when Kentucky’s exclusion rate for learning disabled students soared from just four percent of the entire raw sample the NAEP wanted to test in the previous 1994 testing to 10 percent of the entire raw sample. Nationwide the exclusion rate average was six percent in 1998 (See table on Page 163 in the 1998 NAEP Reading Report Card).

Obviously, if you exclude a large number of students, who as a group are going to have low scores, the overall average score for a state will be inflated.

In the most recent NAEP test cycles for 2013 and 2015, however, Kentucky’s exclusion for reading has been only slightly above the national average. For example, in 2015 fourth grade reading Kentucky’s total exclusion rate (as a part of the total raw sample NAEP wanted to test) was 3.5 percent while the nationwide public school average was 1.7 percent. For eighth grade reading the exclusion rates were 3.1 and 1.5, respectively (See the NAEP 2015 Technical Appendix for Reading, Page 24).

So, when it comes to exclusion, you do need to consider which year of NAEP data you are looking at when you compare Kentucky to other states. However, while other states sometimes continue to have unusually high exclusion, overall this has become less of an issue for Kentucky in the most recent assessments, though some exclusion above the national average continues in the Bluegrass State’s most recent NAEP results.

Consider the sampling issues

A lot of studies totally ignore the fact that the NAEP is a sampled assessment. This means there is plus and minus statistical sampling error in all the results. So, when scores are fairly close from state to state, the best the NAEP can tell us is that education performance in those states is essentially tied.

Never the less, we see many studies such as Quality Counts and the CBER reports that carry out NAEP scores to the nearest tenth of a point to do their rankings when the truth is that these scores are no more accurate than plus or minus several points, at best. What Quality Counts and the CBER tell us are “wins” and “losses” are in actuality just ties. The shallow and simplistic approach in these reports is therefore misleading.

Sampling also impacts other tests like the SAT and ACT. When we talk about the ACT or SAT, a number of states don’t require these tests, so the percentage of graduates taking these college entrance tests varies considerably across the states.

In 2015, for example, the ACT, Inc. reported that 13 states, which include Kentucky, tested 100 percent of all their high school graduates. In sharp contrast, the same report shows Maine only tested 10 percent of its high school graduates. It makes no sense to try to compare results in Maine to Kentucky under those conditions.

Unfortunately, as the promise of Common Core State Standards that all state tests would become comparable to each other solidly crumbles, about the only cross state data available comes from the NAEP and to a much lesser degree from the ACT. We can certainly learn from the data in those tests, but a considerable amount of attention is required to develop an accurate picture of what is really happening. So far, not very many reports are providing us that necessary attention to detail, and the picture they paint of Kentucky’s performance is thus rather dubious.