My first surprise in the new NAEP results – And, it isn’t scores
Doubtless due to lots of people trying to access the new 2022 National Assessment of Educational Progress (NAEP) results, the NAEP Data Explorer web tool was working very slowly earlier this week, so I have not been able to look at much, so far.
The new results include Main NAEP and Trial Urban District Assessment (TUDA) NAEP results for Grades 4 and 8 math and reading. Main NAEP provides state-level scores and the TUDA NAEP covers scores for some of the nation’s largest school systems, Jefferson County Public Schools in Kentucky included.
The initial presentation of the scores in a National Press Club webinar Monday certainly painted a pretty grim picture. In general, as I blogged earlier, scores are down pretty much across the board, in part, but not necessarily completely, due to the COVID pandemic. As was pointed out in the past and mentioned during the press conference, NAEP scores in some cases already turned flat or had even started to decline prior to COVID appearing.
As I looked through one of the few pieces of information I was able to access Monday, something other than scores popped up, and I am scratching my head about it.
Very simply, the numbers of students the NAEP actually tested in 2022, at least in the case of Grade 4 and Grade 8 reading, are down sharply from the numbers tested in 2019.
Table 1, which displays data from two different sources as shown by the links at the bottom of each table section, shows the story.
Table 1
Let’s start right at the top with the numbers of students tested across the entire nation. In the 2019 NAEP Grade 4 Reading Assessment, a total of 154,000 students were tested. In 2022, however, across the entire nation the Grade 4 Reading Assessment only tested 111,600 students.
Where did the other 42,400 students, a drop of 27.5%, go?
Laura LoGerfo, a spokesperson from the National Assessment Governing Board, informs that the samples were reduced in 2022 due to budgetary concerns and nothing nefarious was involved.
That’s still a problem, because when NAEP tests smaller samples of students, the plus and minus sampling errors in the resulting scores increase. That in turns adds more blur to the picture NAEP provides. The increase in fuzz can lead to lots of confusion about whether any progress – or decay – actually occurred.
And, the changes also impacted school districts in the TUDA NAEP. The reduction in tested kids between 2019 and 2022 ran as high as 50% for Cleveland for grade 4 reading, where the sample size plummeted from 1,400 students in 2019 to only 700 in 2022.
These reductions had consequences. The plus and minus errors in all the NAEP reported scores increased, which decreases the sensitivity of the assessments to changes in performance.
The primary way NAEP reports its sampling errors is in Standard Errors. The rule of thumb is that there is a 95% probability that the true score had all students and not just a sample been tested would be within plus or minus two standard errors of the published score.
For Black students in Jefferson County, the NAEP Data Explorer reports that in the 2019 NAEP Grade 4 Reading Assessment, the standard error for the percentage of students reported as scoring Proficient or Above was 2.0. In 2022 it was 2.5.
So, the “fuzz” in the reported proficiency rates for Jefferson County’s Black students increased from plus or minus 4.0 points in 2019 to plus or minus 5.0 points in 2022. Any comparison to another district also must consider the fuzz in that district’s scores. Thus, the precision in the NAEP obviously has suffered a bit.
For a practical example, first check out how Black students in TUDA ranked against each other in the 2019 NAEP Grade 4 Reading Assessment, as shown in Table 2.
Table 2
Old hands with the Bluegrass Institute will be familiar with the format of Table 2. The districts are simplistically ranked by their NAEP percentages of students who are at or above the Proficient standard as shown in the rightmost column in the table. These percentages are listed without any correction for sampling errors.
The sampling error issues are addressed in the “Cross-Jurisdiction Significant Difference” column, which has different shades of blue to tell us which districts statistically significantly outscored, tied, or scored statistically significantly lower than Jefferson County, Kentucky, which is set as the “focal district.”
Numbers and symbols in the Cross-Jurisdiction Significant Difference column show how much higher or lower a given state’s scale score was from Jefferson County’s.
For example, top listed Charlotte outscored Jefferson County for NAEP Grade 4 Reading proficiency by 10 points in 2019. That difference was statistically significant, so Charlotte gets dark blue shading and an up arrow in the Cross-Jurisdiction Significant Difference column.
At the other end of the table, Detroit is identified by its light blue shading and a down arrow to show its Black students’ proficiency rate was statistically significantly lower than Jefferson County’s in 2019.
All the other districts get medium blue shading with diamond symbols. That’s because the difference in their scores from Jefferson County’s 2019 Proficiency Rate of 13 isn’t large enough for us to be sure they really performed any differently from Jefferson County once the plus and minus statistical sampling errors present in all NAEP scores are considered.
So, what the NAEP really tells us is that in 2019, performance across almost all the TUDA districts compared to Jefferson County is a tie. But, the 2019 NAEP was sensitive enough that a positive score 10 points higher and a score 8 points lower than Jefferson County’s were in fact significantly different.
Now, look what happened with the reduced sample sizes in the 2022 NAEP Grade 4 Reading Assessment. Figure 3 has that data.
Figure 3
Even though a district again has a NAEP Grade 4 Reading proficiency rate 10 points higher than Jefferson County and another district has a score 8 points lower, NONE of the districts in 2022 can be considered to have performed any differently from Jefferson County in 2022 after the sampling errors are considered.
That limits the takeaways from the NAEP.
This is the consequence when you don’t sample as many students and NAEP and measurement errors grow.
So, as of 2022, NAEP cannot declare the rather notable 10-point Black students’ proficiency rate differences between Jefferson County and Hillsboro County in Florida, which amounts to about a year of extra learning, really exists at all. That’s not very precise.
And, that returns me to concerns surrounding the first table in this blog. What are the implications of Kentucky’s NAEP Grade 4 reading sample dropping from 3,200 students in 2019 to only 2,300 getting tested in 2022? Among kids NAEP selected to test in 2022, how many never showed up?
During the press conference on October 23, 2022, the Commissioner of Education Statistics, Peggy Carr, was asked a question I submitted about this (At the time, I had not assembled Table 1 and had to rely on memory that told me something was different). She indicated the people running the NAEP had indeed considered this issue and didn’t see any problems.
Still, after seeing how dramatically – and unevenly – the NAEP samples declined all across the nation (a 40% lower number tested in California versus just 22% fewer in South Dakota for Grade 4 Reading), I hope this gets more research.
At the very least, the NAEP’s precision was clearly further eroded by the decision to cut the number of students tested, and analysis going forward is going to have to be even more careful about considering the sampling errors, something that has always been necessary but very often not observed.
Does this mean there is no value in the NAEP? ABSOLUTELY NOT! We can still learn a lot from these important assessments. I’ll have some examples ready for you shortly, so stay tuned.
More on symbology in Figures 2 and 3 for the curious
Note in the Cross-Jurisdiction Significant Difference column that the small cross symbol for Jefferson County just shows it would make no sense to show this district scoring differently from itself. It does not mean the Jefferson County data is not applicable.
However, the double plus symbols for Albuquerque and Austin in the far-right column where the Proficiency rate should appear tells us there were problems with the student samples (Probably too few Black students to derive reasonably accurate estimates) and scores are not reported accordingly. In consequence, the other columns for Albuquerque and Austin are filled with cross symbols to indicate that the data for those school systems is not applicable.
The next three columns to the right of the Cross-Jurisdiction Significant Difference column show how many districts scored statistically significantly higher, the same as, or statistically significantly lower than the district listed on each line.
Finally, the last column has the NAEP Proficiency Rate results.
Source Notes: The data in Table 1 comes from web addresses identified at the bottom of the 2019 and the 2022 data sections of the table. Figures 2 and 3 were assembled using the NAEP Data Explorer.