Tuesday, January 24, 2012

The best get worse, the worst get better

Only the mediocre are always at their best
~ Jean Girandoux, French diplomat, dramatist, & novelist (1882 - 1944)

Suppose you get a group of people and give each person 6 coins to toss. And suppose that you class heads as a good result and tails as a bad result. As everyone starts tossing their coins, someone soon gets 6 heads and you praise them for getting such a good result and hold them up as an example to everyone else. But on their next throw they are back to throwing the same number of heads as everyone else. Conversely, someone gets 6 tails and you criticize them for getting such a bad result and hold them up as an example to avoid. And on their next throw, they are back to throwing the same number of head as everyone else.

What might you learn from this?
  • If you praise someone after good performance then they will just slack off
  • If you criticize someone after bad performance then they will improve
Well, you might learn these lessons, but given an obviously random process, it would be stupid to draw such conclusions wouldn't it?

Yet every day, we draw such conclusions about the performance of those who work in our organisations. The fact is that everyone varies from day to day in their performance for reasons that ave nothing to do with competence or motivation. One day someone achieves a personal best and we enthusiastically praise them, only to see their performance drop. Or someone 'achieves' a 'personal worst' and we have a 'quiet chat' with them and they improve. On a day-to-day basis each persons performance varies around their average performance and the greater the variance from their average on any particular day, the less likely that it will be repeated the next day.

In terms of the coin example, a person has only a 1.6% probability of throwing 6 heads, so there is a 98.4% chance that they will achieve a less extreme result on their next throw, and similarly for 6 tails. The technical term for this in statistics is 'regression to the mean'.

For example, in 1940, McNemar published a study in which he had measured the IQs of children in an orphanage on two occasions a year apart. He found that those who had scored highest the first time did not perform quite as well the second time around. Conversely those who did not perform so well the first time round, did better on the 2nd occasion. And this was entirely due to regression to the mean.

In the workplace, a similar result can occur. We check a group pf people's work and then look at those who performed worst. We take these people aside and coach them, and the next time we check their work, they are doing better. But is this due to the training or due to regression to the mean? Was their poor performance atypical of their average performance? Who knows? Sometimes what we think is happening isn't happening at all.

We could do an experiment and divide the group in two and only coach half of the poor performers and then see if both groups improve. And we might then find that our training was a waste of time and resources. But most organisations are loathe to experiment and as a result they deprive themselves of the opportunity to learn what works and what doesn't.

This doesn't mean that we ignore poor performance. It does mean that to be fair to the people who work in our organisations, we need to track performance over time and not just focus on single points of extremely good or extremely bad performance.

In any group of people if we do a measurement at a single point in time, there will always be someone who performs best and someone who performs worst. We need to be careful that we don't give undue weight to what may turn out to be a regression effect. This is both fairer to individual workers and better for the performance of the organisation as a whole.



The example of IQ measurement in an orphanage is from A Primer on Regression Artifacts (Donald Campbell & David Kenny)

For an actual example of regression to the mean in performance among pilots in the US Air Force see:

Regression to the Mean in Flight Tests (Reid Dorsey-Palmateer and Gary Smith)

An experiment you can do yourself:
Generate 20 random numbers each in two columns of a spreadsheet and sort the columns from lowest to highest in the first column, ensuring that each number in the first column remains with its partner in the second column. Then in general, you will find that the partner of the lowest number in the first column is not the lowest in the second column and similarly for the partner of the highest number in the first column. If we imagine these numbers to be scores on people's work at two different points in time, the worst performer has improved their relative ranking, while the best performer has worsened their relative ranking. Yet clearly the data was generated randomly.

No comments:

Post a Comment