Poor Misunderstood Standardized Tests

How dare we link teacher pay to test results when the people who use the tests don’t understand the tests?  Decision-makers who affect teachers where they live–in the classroom, and the pocket–need a math lesson in reading test scores.

There Are Definite Signs of Confusion

Everyone’s children are being affected: Better teachers are being under-ranked. Weaker teachers are over-ranked, and aren’t getting the mentoring and modelling that would benefit them and their students.

Administrators don’t “get” statistics.

Only by looking at a gain PER STUDENT can we best assess that student’s learning. The only way to be certain of each student’s gain each year is to give her/him the EXACT SAME test (or tests with questions so similar in nature as to be practically identical) at the beginning of the year and the end of the year. Compare Jane’s incoming September score with her outgoing June one.

That’s what I did for my students each year. They, and their parents, each got to see that they had gained at least a year, and often two, in Math and Reading skills.

The second-best way is to give the identical test at the same time of year one year apart. For example, September of 4th grade, and September of 5th grade.

Instead, we give entirely different questions one year apart.

We should all have these goals for standardized testing:

1. How well has each student learned the material: Retained facts, and applied them.
2. How well has each teacher taught the material.
2. How well has each school taught the material.

You can meet these simply, as long as a duplicate test given twice reflects the content and skills of the curriculum. But instead, administrators compare apples to oranges by comparing one grade level’s year-end test scores to a DIFFERENT group of students’ different tests.

(For example, the June scores of the 4th grade class of 2012 to the June scores of the 5th grade class of 2013. The two tests cover entirely different material. The state may even have changed the 4th grade test content after the 5th graders took it the prior year!)

If we MUST follow this illogical testing method, let’s at least try to draw the best conclusions we can:

Small score differences (say, only 1-5%) should be ignored.

Why? There are few enough test questions that only one or two right versus wrong makes this much score difference. We shouldn’t leap to big conclusions about Alberto’s progress from one year to the next based on 1 or 2 questions.

If a student has:

The EXACT SAME percent right in June as the prior June–Success!

That’s right. Unlike what current administrators think, if a student’s score doesn’t go up AT ALL from one year to the next, that absolutely does NOT mean the teacher is incompetent. Remember, the material on the higher grade level’s test is more advanced.

Even if Kelley scored only 40% last year and the same this year, it means she successfully learned an entire year’s new material at the same level of competence as last year. If most of the students in a class meet their last year’s percent, the teacher is worthy. This is especially true if last year’s low scorers met their last year scores, because it is especially hard to teach a year’s worth of information to students who begin the new grade with skills far below grade level.

A HIGHER score the second year?–Outstanding Success!

This means Ron was taught even MORE than a year’s worth. If several students achieved these gains, it likely indicates a strong teacher. (Especially if some of these gainers had low scores to begin with.)

A LOWER score year 2?–May STILL be a competent teacher.

Was last year’s score low? The teacher may have taught Lars a lot more than he knew before, but perhaps he was unable to learn a whole year’s worth because his skills began so low, there are continued home problems…or, he may chronically choose to give low effort and his family takes no action to reverse this.

If last year’s score was NOT low, a lower score the following year is a failure.

Moderate- or high-scorers of last year should be capable of learning a year’s worth at the same percent level of competence. If several such kids have their scores drop, the teacher may be at fault.

(But if only a handful have such drops, look for other causes. For example, did last year’s teachers of these students have her/his class use test dividers and assure no cheating occurred, or were low-effort students successful at copying answers from class “stars”?)

What’s an Administrator to Do?

Confusion Signpost


How SHOULD we get summaries of whole class, grade-level, and school performance from individual student score differences? A couple of possible methodologies for rating teachers based on test scores of their students are given in the ADDENDUM.

Regardless of the fine statistical details, only after we look at what individual students gain or lose can we fairly compare results between teachers, schools, and districts in order to look for patterns and determine which teaching models and materials we want to either emulate or avoid.

And only after teachers don’t supervise the testing of their own students, unless filmed doing so. Only then might we fairly consider relating salaries to test scores.

Of course, administrators don’t want to deal with reality, because reality is complicated. They will continue to use totally specious data to form incorrect decisions that adversely affect teachers, teaching, and your children.
Previous Post: The Pink-Tinged Ghetto
The Next Post: Why Do Teachers Think Testing Is Unfair?


Perhaps, assign one “strong-teaching” point for every five test-score points of each student’s score gain (above that first insignificant 1-5%), and take away one strong-teaching point for every five test-score points lost per student. Then, look at the net sum of strong-teaching points for the class.


A better method might go something like this: “8 students had their test scores go up by 10 points or more, but 2 students had their scores drop by that much. Everyone else’s scores stayed about the same (ignoring the “1-5%” points). So, that teacher had a net gain of 6 students who learned more than year’s worth.

Using this method, this teacher’s rating would be +6, which I would rank “very good”. A rating of +0, meaning all scores either stayed the same, or an equal number went up as went down, would indicate a rating of “competent”. If a teacher had a high proportion of plus scores (students who improved) each year, that would be an outstanding teacher, particularly if some of these students showed large point gains. Similarly, if a teacher earned “only” a “competent, but earned it year after year, and had few students lose ground each year, that would indicate “highly competent.”

This system seems to reflect reality and reflect better on the marvelous competence of many of today’s teachers better than anything I have heard or seen from the lips or pens of administrators.


Is my methodology valid? Real statisticians are paid to figure this stuff out. It may be that whatever method is chosen, we should adjust results based on the level of the entire class. If test scores of the majority are particularly low to begin with, is it harder to achieve a large point gain? Unlike in a gifted class, the students aren’t learning much from each other. If those classes see a test score increase, maybe those teachers should get a whopping bonus.
Previous Post: The Pink-Tinged Ghetto
The Next Post: Why Do Teachers Think Testing Is Unfair?


Leave a comment

Best comment wins prize! (sorry, i tell naughty lie...)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: