Is Kentucky’s school testing in science going into a time warp?

August 7, 2014 Richard Innes

Vague NextGen Science Standards creating serious problems as Kentucky struggles with a way to assess them

Other states take note: Kentucky is lead on creating new science assessments

I was present at the Kentucky Board of Education’s “Retreat” meeting yesterday – I think. Or, maybe I was at a meeting back around 1995. It was hard to tell.

The board was discussing how to assess the new Next Generation Science Standards (NGSS). These standards are so vague and unconventional that no-one really has a handle on how – or even if – they can really be assessed. Certainly, the people who wrote the NGSS never thought through the assessment issue (something really good standards writers understand is critical). This lack of attention to detail is creating a lot of problems for the Kentucky Department of Education.

But, that isn’t stopping Kentucky’s educators from trying. Unfortunately, faced with a new set of assessment challenges, many unknowns and no real research thanks to the poorly crafted NGSS, Kentucky’s proposals to cope are mostly a warm-over of old Kentucky Instructional Results Information System (KIRIS) notions. Those KIRIS ideas were tried – and seriously failed – during the 1990s. Click the “Read more” link below to learn how the plans to assess the NGSS in Kentucky seem almost inevitably headed for real trouble.

The proposals for NGSS assessments really do bring back bad memories from Kentucky’s past assessment history.

Example: The board was told that the new science assessment needs to be “focused on problem solving, and include authentic performance tasks.”

Problem solving, Performance Tasks – it’s so déjà vu.

In the early 1990s Kentuckians were told their then new KIRIS tests would evaluate students’ problem solving ability with lots of open response written answer questions and what were then called “Performance Events.” These ideas were new and untried in 1990. They are not new now. And, they have a history.

Performance Events were the great hope in the early days of KIRIS testing. In fact, it was hoped that Performance Events eventually would totally replace multiple-choice tests and possibly most of the open-response written questions, too.

A typical KIRIS fourth grade performance event provided a team of four students a piece of paper with a number of cartoon images of the lady bug insect. Students were given drawing tools like rulers and compasses and were to develop a way to determine how many of those images were on the paper.

The question creators anticipated students would do something like fold the paper into fourths so they only had to count a portion of the total images and then could extrapolate to determine the total number on the paper. Of course, by the time students were in the fourth grade, I suspect a number of teams would simply count all the images, ticking them off with a pencil to prevent double-counting, quickly developing an exact answer in minimal time. That would not take much higher order thinking, but it does show how it is actually very difficult to create really useful questions of this type.

Sadly, reality quickly caught up with the dream for Performance Events. In only four years Kentucky found out that performance type items are impossible to sustain in an accountability test environment. Test creators were unable to come up with new performance event items that sampled the same material to the same level of difficulty so that valid trend lines could be maintained in scores. Performance Events in KIRIS: Born 1992-93, died 1996.

Example: Another old testing term made a ‘ghostly’ reappearance in the meeting. Because there is so much material in the science area, the state board was told there is no way to test each student on every topic. The proposed fix is to “Matrix” the test questions. That is education-ese for having a fairly large question bank but only giving each student a small portion of all the questions. The result is that different students take notably different tests. That can make a student’s score something of a luck-of-the-draw proposition. It can even shade overall school level scores in small enrollment situations.

Kentucky’s panel of national testing experts actually commented on this very issue in a February 22, 2005 letter, saying that with matrixing:

“…there is no assurance students are in fact evaluated against the exact same set of skills.”

As a consequence, the letter continues:

“…there is enough error in student level scores to advise caution in interpreting and using these scores – and reason to not base high-stakes student consequences on these scores.”

[Catterall, James S., Poggio, John, “Requested Responses to OAC Questions (Senate Joint Resolution 156), Letter dated February 22, 2005.]Here’s a simple example of how matrixing creates problems. Let’s say your child really likes biology but hates chemistry. If a youngster gets more questions from the biology area, the scores will be notably higher than if the same child gets more chemistry questions from the matrix. Either way, you cannot rely on matrix-based test results to fairly represent a student’s real ability in science. Testing experts have already said so.

Example: Another ghost from the KIRIS past also surfaced. Again thanks to the huge amount of material in science, it was mentioned that one, end-of-the-year test would not be sufficient. Instead, “Through Course Assessments” will also have to be given to students. These will probably be in the form of “Performance Tasks,” which sound awfully similar to those long-failed KIRIS Performance Events.

Doing portions of state testing during the course of the school term immediately creates a guessing game for science teachers. Teachers will have to guess when each science element they are supposed to cover during the school year will actually come up on a Through Course Assessment because the NGSS does not provide such guidance. If teachers guess wrong, scores for all their kids will be low.

One way around this puzzle would be to only cover content that students were supposed to receive in the previous school year. However, that would hold this year’s teacher hostage to last year’s teacher’s performance. The teachers union is going to love that – NOT.

Example: Due to funding realities, Through Course Assessments, which will be given at several points throughout the school year, will have to be graded by the students’ own teachers.

That grading plan sounded entirely too much like the old KIRIS writing portfolio program. Under this also failed-as-an-assessment item from the past, students assembled examples of their writing during the course of the entire school year and their own teachers then graded the assembled documents. This failed as an assessment element because teachers, knowing their schools would be judged on the grades they awarded, gave in to human nature and inflated the scores. A number of scoring audits were performed over the lifetime of the writing portfolio assessment program; every single audit revealed significant inflation in teacher-awarded scores. A solution for teacher self-grading inflation was never discovered.

More recently, the first round of Program Reviews from Kentucky’s current Unbridled Learning school accountability program provides more evidence that teacher self-grading does not work. As I wrote a few months ago in “Program review: Will this approach do?” a comparison of the scores from the teacher-graded Writing Program Reviews to the scores from separate writing mechanics testing provides compelling evidence that too many teachers are going to seriously game any self-grading program.

By the way, those writing portfolios took up a huge amount of class time, too. Through Course Assessments will obviously eat up currently unknown amounts of class time during the year, too.

Thus, thanks in part to serious deficiencies in the NGSS (I repeat, standards that don’t address their own assessment are incomplete and potentially very problematic), we now see an old and failed assessment idea – teacher scoring – coming back, again, this time in science.

However, this time, not just schools, but also teachers are going to be held personally responsible through state evaluation of their students’ performances. Thus, you can almost certainly count on the score inflation that constantly occurred with the writing portfolio program coming back even stronger with the proposed science Through Course Assessments. Given the state’s abundant history of never finding a fix for such inflationary scoring, and given that evidence of such inflation has already surfaced in the current Writing Program Reviews in Unbridled Learning, you probably won’t be able to trust your own child’s science scores.

This is the sort of “stuff” that happens when education standards are not VERY carefully developed with an eye towards assessing how well the standards are actually being met.

Thanks to a disregard for Kentucky’s failed testing history plus poorly prepared science standards which don’t consider required assessment tools and related research, science in Kentucky could be shaping up to be a much bigger problem than the Common Core State Standards for math and English language arts.