Education Program Reviews: What will state leaders do?

I first wrote about problems with new additions to Kentucky’s Unbridled Learning school accountability program – known as the “Program Reviews” – some time ago. These problematic additions are self-scored by schools and thus are ripe for inflation in the same way Kentucky’s old Writing Portfolio element in the now defunct Commonwealth Accountability Testing System (CATS) and Kentucky Instructional Results Information System (KIRIS) assessments always provided questionable data.

Concerns about the Program Reviews surfaced again last week, becoming an interesting topic of discussion during the April 1, 2015 meeting of the Kentucky Board of Education.

Very simply, results from the state’s first three new education Program Reviews, which eventually will include five separate reviews for:

• The Arts and Humanities

• Practical Living/Vocational Studies

• Writing Instruction

• World Languages and

• The functioning of the Primary Program (Kindergarten to Third Grade)

don’t look trustworthy. Schools well-known for better performance sometimes got low Program Review scores while other schools with notoriously bad reputations had the audacity – or maybe just a lack of real understanding on the part of their less proficient staff – and self-award high scores for the quality of their education programs.

Especially for the Writing Program Review, where we can compare results to scores students receive for On-Demand Writing during KPREP testing, the Program Review scores for numerous Kentucky schools don’t look trustworthy (see more on that if you click the “Read more” link).

Program Reviews are a growing problem because starting with the 2013-14 school term the first three Program Reviews now count in Kentucky’s Unbridled Learning school assessment formulas. Inflation (in some cases – even deflation) in the Program Review scores not only provides bad information to policymakers and the public, but those errors now reduce the overall validity of the final scores from Unbridled Learning as well. The error could be considerable. At one point, the five proposed Program Reviews were going to count for as much as 30 percent of the overall Unbridled Learning score. The state board is now looking at reduced weighting, but providing any weight to these easily inflated items will certainly impact Unbridled Learning scores overall.

The Kentucky Board of Education is going to have to develop some better answers. Until that happens, it looks like scores from Unbridled Learning will probably suffer from more and more “Unbridled Inflation.” That isn’t going to help our students get better educations.

It also sets the stage for the same sort of problems that ultimately doomed the credibility of Kentucky’s former KIRIS and CATS assessments.

Over the years Kentuckians have learned that some things in the state’s education system simply are not suitable for analysis through student testing. Examples include the arts and humanities, some vocational studies and practical living subjects. Kentucky tried evaluating those areas with tests under its now defunct KIRIS and CATS school assessment programs, but the results were never really satisfactory.

We ran into more problems with the old writing portfolio program used in both KIRIS and CATS. While writing portfolios are well-regarded as a classroom instructional tool, portfolios proved to be a major disappointment as school accountability items.

For one thing, arcane rules for writing instruction were introduced to prevent teachers from providing excessive help and thereby inflating student scores. However, those rules actually hampered the effective teaching of writing.

For another thing, the only practical way to find enough affordable and qualified talent to grade all that writing was to have teachers score their own students’ work. But, teachers and their schools were being held accountable for those scores. Human nature took over because every single audit performed on the Writing Portfolio program found that in too many cases the scores had significant inflation. The results inflated the overall KIRIS and CATS scores for schools. Worse, everyone got inaccurate information about the real quality of writing instruction in Kentucky.

Senate Bill 1 was introduced in the Kentucky’s 2009 Regular Legislative Session to fix problems with CATS. One feature of the bill was an attempt to do something about the dubious results and possible adverse consequences from tests in subjects like the arts and from the writing portfolios.

Thus was born the idea of “Program Reviews.” These reviews would use experts to examine how well programs in things like writing portfolios and the arts were doing in each school. Student scores would not be considered. In most subject areas, no tests would be used.

The Program Review plan seemed like an interesting idea, but there was an unfortunate hitch – WHO would conduct all those reviews?

In the end, fiscal realities took over. Ignoring the lessons of human nature from the old KIRIS and CATS writing-portfolios-in-assessment mess, Kentucky’s education leaders decided that local school staff would do their own Program Reviews. This decision was almost sure to cause problems, and it didn’t take long for the first evidence to appear.

As soon as the results were released from the first Unbridled Learning Writing Program Reviews, which were conducted in 2013, I compared those results to the scores students had achieved on the Language Mechanics tests from KPREP. As I blogged at that time, the results looked very problematic. Some of the schools with top scores for Language Mechanics scored their apparently rather successful writing programs very harshly. Meanwhile, some of the very worst performing schools for Language Mechanics gave themselves superior scores in their Writing Program Review. Bluntly put, in too many cases things just didn’t look credible.

Now, there is more evidence that the Program Reviews are indeed inflated and untrustworthy.

During the April 1, 2015 Board of Education meeting, three representatives from Kentucky’s arts community spoke about problems with the instruction of the arts in the state. During this discussion, concerns were raised about the questionable impact on arts instruction due to the shift to easily inflated Program Reviews for the arts. The fear was that inflated Program Reviews would hide real problems and also result in a de-emphasis on arts instruction.

At one point, a board member asked if we could use the Program Reviews for the arts to determine which schools were not doing a good job for students. Kentucky Commissioner of Education Terry Holliday responded with a very cautionary comment about his confidence in the Program Review scores, saying:

“We’re not 100 percent certain those ‘Distinguished’ ratings are ‘Distinguished.’”

That got me thinking that it was time to update my Writing Program Review analysis from last year.

This time, I compared actual scores from the 2013-14 On-Demand Writing tests given as part of the KPREP to the Writing Program Review scores for the same school term.

The results don’t look very good.

Logically, you would expect schools with strong writing programs to produced better results on On-Demand Writing tests. However, I found plenty of examples of schools that gave themselves high marks for their writing programs despite receiving some of the very lowest On-Demand Writing scores (scoring of On-Demand Writing comes from testing contractors, not teachers).

For those of you not into numbers, here are some simple descriptions of a few of the things I found.

There are examples of questionable Writing Program scores for schools that had very high On-Demand Writing test scores. There are equally questionable results for schools that have very low On-Demand Writing test scores. Problems abound in the elementary, middle and high school levels.

For example, Jefferson County’s Shacklette Elementary and Roosevelt Perry Elementary, plus West Point Independent’s West Point Elementary School and Bell County’s Frakes School Center all have On-Demand Writing performance at the very low end for the 690 elementary schools that had full data available to use in this comparison. However, all four self-scored their Writing Programs rather highly, getting a "Proficient" score.

Henry County’s Eastern Elementary School doesn’t rank much better for the actual writing capability of its students. It placed way down in 634th place from the top among the 690 elementary schools with 2013-2014 data. However, Eastern's staff awarded themselves a top-drawer "Distinguished" score for their Writing Program. That is very hard to believe given that far fewer than one in five students in this school produced acceptable On-Demand Writing performance.

My Excel worksheets show similar issues among the 329 middle schools with usable data. More than a dozen of the schools placing in the bottom 10 percent for On-Demand Writing performance awarded themselves "Proficient" scores for their Writing Programs. Though ranking quite low in the 286th place for its students' On-Demand Writing scores, the Casey County Middle School awarded its Writing Program a "Distinguished" level grade.

A dozen of the very lowest performing high schools for On-Demand Writing apparently don't think their programs have problems, either, as they also awarded themselves "Proficient" level scores for their Writing Programs. One high school, the Casey County High School, also garnered the audacity award, giving itself a "Distinguished" Writing Program score even though its On-Demand Writing test score average ranked only 15 places above the worst performing high school in the entire state.

For those of you into numbers, I ran a standard statistical analysis known as a correlation to see how the Program Review scores related to the On-Demand Writing scores. With this analysis, when two sets of numbers closely track one another – which we would expect if the quality of writing programs were being accurately reflected in actual student writing scores – a number close to 1.0 would result. If there wasn’t much of a relationship, the number would be close to zero.

As Figure 1 shows, the numbers for every school level are indeed low.

2014 Writing Program Review Correlation to On-Demand Writing Scores

2014 Writing Program Review Correlation to On-Demand Writing Scores

Regardless of school level, Figure 1 shows there isn’t much relationship between how schools in Kentucky self-evaluate their writing programs as compared to what happens when students actually put pencils on paper.

There could be a number of explanations for why this disconnect is occurring.

It might be that the scoring of On-Demand Writing Tests in KPREP isn’t trustworthy. However, for this to be the case, the scoring could not just be too hard or too easy across the board. It would have to be highly unstable across all levels of student writing and unstable across schools. I have heard some complaints about the scoring of writing, but those complaints were that it was simply too hard. I have not heard complaints that the scoring is wildly unstable.

It could be that schools are making a good faith effort to conduct Program Reviews but the guidance from Frankfort on how to do that is seriously deficient and confusing. However, if this were true, and if schools really did understand what good writing programs looked like, then there should have been a lot of complaints about the scoring guidance. I am not aware of that happening. On the other hand, if schools don’t really know what good writing programs look like, there might not have been pushback, but students obviously would not be getting good writing instruction, either.

Unfortunately, it could also be that schools are not putting in a good faith effort on the Program Reviews and are inflating the results as much as they think they can get away with. Sadly, this seems the most likely scenario and fits with what seemed to be happening with the old Writing Portfolio items in KIRIS and CATS.

The bottom line for all of this is that human nature is going to trump the best intentions of those who want to evaluate all things with an assessment program. Whether the issue is staff just not knowing what good programs really look like, or staff wanting to score themselves as highly as they think they can get away with, the results for students are what really matter, and we cannot have claims of superior writing programs in schools where students clearly are far from superior writers.

The facts of life are that Kentucky cannot begin to afford enough qualified, out-of-state, unbiased graders to do all the Program Reviews required in the well over 1,000 schools we have in Kentucky. And, the old writing portfolio audit program provides no encouragement that we can do enough audits to keep everyone on the up and up, either.

This leaves the Kentucky Board of Education to face some very tough decisions. Do they keep obviously “unbridled” Program Review scores in Unbridled Learning and risk a loss of confidence in the overall program such as happened with KIRIS and CATS, or do they drop (or reduce to only trivial levels) scoring of these problematic elements in the Unbridled Learning formula? As we enter the 25th year of the Kentucky Education Reform Act of 1990 and enter into a new baseball season, Kentucky’s school accountability score card already includes two strikeouts – the KIRIS and CATS assessments – and the count on Unbridled Learning’s Program Reviews isn’t looking all that good, either.

Note: I sent much of the message above along with the Excel spreadsheet I created to the head of the Kentucky Board of Education and several key members of the Kentucky Department of Education.

I’ll be happy to e-mail the Excel to anyone who asks for it.