Thursday, January 26, 2012

Compared to what?

The judgements that we make can at times depend on the subtlety of the measures we use.

Consider an insurance company where the employees have some latitude to waive the excess on a policy if the person making the claim has mitigating circumstances. Each person may make decisions on a variety of different policies (home. contents. car, boat, accident, personal liability etc.)

Now suppose you wanted to judge whether any of your employees were being overly generous in waiving excesses.

One measure you might use is to simply look at the proportion of claims that a person processed during some time period for which they waived the excess. You might compare this to the total proportion waived over all claims processed by all employees and target those employees who deviated too much from this average. However this is a very crude measure since it fails to take into account the types of claims each individual is processing.

A more refined measure might be to take the average proportion waived by all employees for each individual claim type and then calculate what each individuals waive rate would have been had they waived at these rates for the claims they actually processed. This is a fairer measure since it takes into account that the waive rate might differ between different kinds of claims. And again, you might target those employees for whom the deviation between actual and expected waive rates was too great. However, if any of your employees is just processing a single type of claim then they may be making the largest contribution to the waive rate for that claim type so you would effectively be comparing that person against themselves with little chance of a major deviation.

Both of these methods compare what an individual is doing against what everyone else is doing so it is a comparison against a norm of behavior rather than against any objective measure of what should be happening. For all you know, most employees might be being less generous than you might like, so the person who appears to be being more generous might actually be doing what you want. However, the attraction of these methods is that they are simple to implement.

The more important question however is what is the intended outcome of this policy and whether what your employees are doing is achieving this outcome. Every time an employee waives the excess it is a cost to your company, so presumably you want an outcome that will compensate for this cost. Otherwise, you would just be throwing money out the window. You would need to establish some metrics against which to measure the success of your policy (e.g. customer loyalty as measured by policy cancellations, taking out additional policies etc).

In the case of increased customer loyalty, you might gain an increase in policies due to word of mouth or conversely be able to reduce your advertising costs. Whatever measure you use, you should be able to measure the costs of the policy to your company versus the benefits gained and this in turn might give you a clearer indication as to how generous your employees should be. This would take a lot more gathering and analysis of data but would allow a more targeted approach. You might find that waiving the excess for one type of claim might result in no benefit at all. Or that placing conditions (e.g. 10 years without a claim) around a policy type in order to realise any benefits.

The major differences between this approach and the approach based on averages are:
  • In this approach the policy has a clear purpose and you are more interested in whether the total effect of employees actions is achieving that purpose. In the 'average based' approach, you are interested only in how employees compare with each other (whether or not this benefits the business)
  • This approach provides information that enables you to evaluate and refine the policy itself. The 'average based' approach allows you to change the behavior of individuals but such change might actually adversely affect the business.
In other words, one approach is looking at te big picture of the businesses actions and its customers reactions, whereas the other is treating the business as if the customers don't matter at all. It seems like a no brainer as to which approach to use but all too often managers employ the easier option rather than investing time and effort in analysis and refinement or more importantly on thinking through what a policy is intended to achieve.

Tuesday, January 24, 2012

The best get worse, the worst get better

Only the mediocre are always at their best
~ Jean Girandoux, French diplomat, dramatist, & novelist (1882 - 1944)

Suppose you get a group of people and give each person 6 coins to toss. And suppose that you class heads as a good result and tails as a bad result. As everyone starts tossing their coins, someone soon gets 6 heads and you praise them for getting such a good result and hold them up as an example to everyone else. But on their next throw they are back to throwing the same number of heads as everyone else. Conversely, someone gets 6 tails and you criticize them for getting such a bad result and hold them up as an example to avoid. And on their next throw, they are back to throwing the same number of head as everyone else.

What might you learn from this?
  • If you praise someone after good performance then they will just slack off
  • If you criticize someone after bad performance then they will improve
Well, you might learn these lessons, but given an obviously random process, it would be stupid to draw such conclusions wouldn't it?

Yet every day, we draw such conclusions about the performance of those who work in our organisations. The fact is that everyone varies from day to day in their performance for reasons that ave nothing to do with competence or motivation. One day someone achieves a personal best and we enthusiastically praise them, only to see their performance drop. Or someone 'achieves' a 'personal worst' and we have a 'quiet chat' with them and they improve. On a day-to-day basis each persons performance varies around their average performance and the greater the variance from their average on any particular day, the less likely that it will be repeated the next day.

In terms of the coin example, a person has only a 1.6% probability of throwing 6 heads, so there is a 98.4% chance that they will achieve a less extreme result on their next throw, and similarly for 6 tails. The technical term for this in statistics is 'regression to the mean'.

For example, in 1940, McNemar published a study in which he had measured the IQs of children in an orphanage on two occasions a year apart. He found that those who had scored highest the first time did not perform quite as well the second time around. Conversely those who did not perform so well the first time round, did better on the 2nd occasion. And this was entirely due to regression to the mean.

In the workplace, a similar result can occur. We check a group pf people's work and then look at those who performed worst. We take these people aside and coach them, and the next time we check their work, they are doing better. But is this due to the training or due to regression to the mean? Was their poor performance atypical of their average performance? Who knows? Sometimes what we think is happening isn't happening at all.

We could do an experiment and divide the group in two and only coach half of the poor performers and then see if both groups improve. And we might then find that our training was a waste of time and resources. But most organisations are loathe to experiment and as a result they deprive themselves of the opportunity to learn what works and what doesn't.

This doesn't mean that we ignore poor performance. It does mean that to be fair to the people who work in our organisations, we need to track performance over time and not just focus on single points of extremely good or extremely bad performance.

In any group of people if we do a measurement at a single point in time, there will always be someone who performs best and someone who performs worst. We need to be careful that we don't give undue weight to what may turn out to be a regression effect. This is both fairer to individual workers and better for the performance of the organisation as a whole.



The example of IQ measurement in an orphanage is from A Primer on Regression Artifacts (Donald Campbell & David Kenny)

For an actual example of regression to the mean in performance among pilots in the US Air Force see:

Regression to the Mean in Flight Tests (Reid Dorsey-Palmateer and Gary Smith)

An experiment you can do yourself:
Generate 20 random numbers each in two columns of a spreadsheet and sort the columns from lowest to highest in the first column, ensuring that each number in the first column remains with its partner in the second column. Then in general, you will find that the partner of the lowest number in the first column is not the lowest in the second column and similarly for the partner of the highest number in the first column. If we imagine these numbers to be scores on people's work at two different points in time, the worst performer has improved their relative ranking, while the best performer has worsened their relative ranking. Yet clearly the data was generated randomly.

Sunday, January 22, 2012

The downward flow of mediocrity

Some years ago a study was done which found significant differences between effective managers and successful managers. Successful managers were defined as those who gained rapid promotion relative to their length of time with the company, whereas effective managers were those who actually did the work to make the company work.

Whereas successful managers spent most of their time on networking and politics, effective managers spent more time communicating with the people who reported to them, in planning and in getting things done. The sad thing is that the successful managers were promoted more rapidly than the effective ones.

One of the many dangers that organisations experience is when those who lead them are mediocre in their performance. Because of their mediocrity, they do not deal with the mediocre performance of those who report to them, whether this is because they do not recognise that anything is wrong or because it is too much trouble to deal with. And as a result they find themselves surrounded by mediocrity. And the mediocrity continues to flow ever downwards. Such managers find the mediocrity of those around them comfortable since there is no risk of anyone performing better than themselves and it allows them to hide their lack of performance.

The one thing they fear is someone who reports to them who is actually effective. An effective and conscientious manager who is surrounded by apathy and incompetence, may end up trying to resolve issues outside of their area of responsibility and as a result generate hostility from the less competent managers, And they may be seen as threatening by their own manager since they ask the awkward questions that such a manager would rather not answer, and because they know where all the skeletons of things which have gone wrong are buried.

So paradoxically, the manager who actually keeps the organisation afloat may be the very manager who is resented the most, all the more so because they cannot be eliminated because their contribution is also the only thing that keeps the mediocrity of the more senior manager hidden.

I know of organisations where senior managers jump to making decisions without any data gathering or analysis, without considering the implications and consequences. And when a manager tries to put the brakes on and raises issues with what they are doing, they are seen as being obstructive.

Not a pretty picture, but this dynamic explains a lot of what has happened recently in the finance and airline industries, as well as the financial problems in the Eurozone: when people who are successful because of networking and politicking rise to positions of power for which they are ill-equipped, the results can be catastrophic.


Further reading:
Successful vs Effective Real Managers Fred Luthans

Real Managers Luthans, Yodgetts & Rosenktrantz

Monday, January 16, 2012

Failure to Learn

A few months ago my mother was admitted to hospital with pneumonia and while she was in hospital two incidents occurred which highlight how standard treatments can block learning.

In the first incident, over several days while intensive antibiotics were being administered, my mother became delirious and paranoid. I did a little bit of research on the Web and found that certain antibiotics can cause such reactions (in fact several classes of antibiotics can do so). Now, to me, the logical response to this should have been to change the antibiotic. Instead her doctors added an anti-psychotic drug and a sedative. In other words, they added two more drugs to deal with the effects of the first one.

In the second incident, she was having problems with maintaining blood oxygenation due to chronic heart failure. I did some further research on the Web (one of my degrees is in epidemiology, by the way) and found that the herbal supplement hawthorne has been proven to be beneficial - in fact I found a meta-analysis of 14 different research studies published in reputable medical journals that confirmed the benefit of hawthorne and the absence of side effects. I brought this to the attention of her doctor but he was unwilling for her to try this supplement. I presume this is because it is not a standard treatment in Australia (yet it is an accepted treatment in Germany).

In both of these cases, following a standard treatment regimen meant that there was no possibility of learning, in the first case failing to learn that a serious side effect of a drug could possibly be eliminated by switching the drug rather than adding additional drugs with potential side effects of their own. And in the second case, failing to learn that an accepted treatment in another Western country could conceivably be beneficial to patients in Australia.

The nett result of this is that patients in this country receive less than the optimal treatment. There is no telling how many deaths or complications occur annually as a result.

However, this isn't limited to medicine. About 18 months ago, I had an abscess in a tooth. My dentist said that I needed to have root canal to treat it, which ended up costing me around $800. Presumably this is the standard treatment. A year later the same tooth became infected again, but instead of going to the dentist, I took a course of antibiotics and the infection cleared up with the tooth remaining sound. Had I gone back to the dentist I imagine they would have said that it needed to be extracted.

But here's the thing: suppose that I had been prescribed a course of antibiotics the first time around instead of the root canal. And suppose the infection had cleared. Then this would have saved undermining the integrity of the tooth and possibly the second infection. Is it likely that dentists will ever learn this? My guess is no: firstly because an expensive treatment provides no incentive to try a much much less expensive treatment and secondly because if they only ever use the technique they were taught they have no opportunity to observe the effect of alternatives.

So what do these personal examples tell us about continuous improvement?

Firstly, that where there is an accepted way of doing things that 'works' or appears to work, it may be difficult to get someone to try something that could work better, especially where there is an incentive to continue with the current practice.

Secondly,  people (including experts) are more likely to deal with symptoms than go back to root causes in solving problems. As a result, they add a further layer of complication which may obscure what is really happening.

Thirdly, the ready availability of quality information on the Web is no guarantee that it will be used. Natural human inertia will serve to keep things going the way they have always gone.

Fourthly, expertise and specialisation may blind someone to better possibilities in dealing with problems. What is standard in a profession may block learning of better ways of doing things.

Finally, there can be cultural barriers to improving methods, techniques and treatments. (In Anglophone countries, for example, herbal and traditional medicine are not held in high esteem, so effectively doctors discount the benefits of Chinese and Ayurvedic medicine, despite thousands of years of proven benefits.)

Sunday, January 8, 2012

Multiple causes, multiple consequences - Part 2

"Is there any other point to which you would wish to draw my attention?"
"To the curious incident of the dog in the night-time."
"The dog did nothing in the night-time."
"That was the curious incident," remarked Sherlock Holmes.


~ From "Silver Blaze" in "Memoirs of Sherlock Holmes" by Sir Arthur Conan Doyle


Noticing what didn't happen can be both difficult and important. Difficult because things that don't happen don't register on our senses, important because what is absent may provide important clues to what we need to do to either improve or conversely avoid disaster.

Consider the following examples:

Example 1: Edward Jenner's observation that milkmaids did not generally get small pox led to his discovery of vaccination ( from the Latin word for "cow"). By looking at people who dd not get the disease he derived a way of preventing it.
Example 2: During World War II, the patterns of bullet holes in returning aircraft were being studied to determine where they should be reinforced. Statistician Abraham Wald however had the insight that the bullet holes in surviving aircraft were clearly non-fatal and that it was the areas without bullet holes which were more likely to need reinforcing and this was confirmed from studying wreckages of planes that had been shot down. (Note: This is an  over-simplification: for Wald's original work see link below)
Similarly by noticing what information is missing we may prevent ourselves from making bad attributions in relation to causality.

Consider the following example:

Suppose in one group people follow strategy A which leads to major failure 90% of the time and outstanding success 10% of the time. And suppose that in a second group, they follow strategy B that rarely leads to major failure but which generally leads to moderate success. If the only information available to us was that strategy A leads to outstanding success then we might falsely conclude that strategy A is better than strategy B.

Jerker Denrell has published numerous articles on precisely this issue, how only studying successful organisations and individuals provides a misleading picture of the types of strategies that lead to success. For example in Predictng the Next Big Thing he argues that making an accurate prediction about an extreme event may in fact be an indication of poor judgement. It may simply be an indication that the person makes extreme judgements in general and that this time they got lucky. But unless we look at the full picture, we may conclude that such a person is some kind of genius.

So, we need to ask ourselves:
  • What am I not seeing that I should be seeing?
  • What is this person/organisation's track record like (i.e. not just their personal best)?
  • Is this attribute or characteristic common to failures as well as successes?
  • What isn't happening in this situation?
  • What didn't happen that was critical?
  • What information am I missing that is necessary to make a valid judgement?
  • What information do I need to collect and analyse to see what is really going on?
This last question is extremely important.

There are organisations where none of the following are documented: why a decision was made, what information and analysis it was based on, and what were the outcomes and consequences. As a result there is virtually no capacity within such an organisation to learn from errors or refine decision-making nor is there any accountability, a recipe for mediocrity.

The way to avoid this is to be vigilant in documenting what was done (and thus what may in retrospect be seen to have been overlooked) and to look for not just what happened but what didn't happen. It is the missing piece of the jigsaw that is needed to show the full picture.


"Psychology and Nothing" Eliot Hearst ( American Scientist Volume 79) is an interesting article which discusses the perceptual and cognitive difficulties of seeing what isn't there (Unfortunately, I haven't been able to locate a free version of this paper on the Web)

Failure is a Key to Understanding Success (Standford GSB News, January 2004)

The Weirdest People in the World (Heinrich, Heine and Norenzayan): Posits that most psychology research is based on a very narrow sample, people who are from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies and that as a result the findings may not be as generaliseable to human beings in general as usually thought.).

A Method of Estimating Plane Vulnerability Based on Damage of Survivors (Abraham Wald)
Abraham Wald's Work on Aircraft Survivability (Mangel and Samaniego)

Tuesday, January 3, 2012

Multiple causes, multiple consequences - Part 1

“What is Fate?” Mulla Nasrudin was once asked by a scholar.
“An endless succession of intertwined events, each influencing the other”, he replied.
The scholar raised a sceptical eyebrow: “I can’t accept that. I believe in cause and effect”.
Very well”, said the Mulla and drew his attention to a procession of people, leading a man to be hanged. “Is that man going to die because someone gave him the money that let him buy the knife he used for murder, or because someone saw him do it, or because nobody stopped him?”
~Idries Shah

As the above story shows causality isn't a simple concept. There may be factors that contributed to the particular event but which could equally have resulted in no serious consequences. There may have been inhibiting factors that ceased to act. There may have been a confluence of factors that each had little or no impact but which together were disastrous. A situation may have been in a delicate balance that was tipped by something of comparatively minor importance (the straw that broke the camel's back, the butterfly effect)

Sometimes responsibility can be muddied by questions about causality:
Three people Alfred, Bob and Charlie were crossing a desert and they stopped at an oasis when night fell.. Alfred hated Charlie and decided to kill him, so in the middle of the night wile the others slept, he got up and poisoned the water in Charlie's canteen. Bob also wanted to kill Charlie and, not knowing that Charlie's canteen had been already poisoned, and got up in the early hours of the morning while the others slept and made a hole in Charlie's canteen, so that the water slowly leaked out. The next morning the three went their separate ways and a few days later Charlie died of thirst. Who was the murderer - Alfred or Bob? Or to put it another way: who caused Charlie's death?
In a case like this, it is clear that if neither had acted then Charlie may still have been alive. Yet neither individually caused his death. He didn't die of poisoning so Alfred is not individually responsible, yet if Bob had not acted Charlie would have died anyway as a result of Alfred's actions.

This kind of thing happens all the time in organisations. One person fails to act to prevent the problem or to put controls in place that would have identified it early enough to ameliorate it, another person acts in a way that would have been harmful to the organisation, except that a third person's action neutralised their contribution without preventing the final disastrous outcome. So who is responsible? In the end what purpose is served by assigning blame? Perhaps it is better to work out what factors facilitated and inhibited the final outcome and how the balance changed so that that outcome occurred.

There are a lot of questions we can ask ourselves to try and determine what happened:
  • Why did this problem occur?
  • Why did it happen now?
  • What previously prevented it from happening?
  • What occurred that hadn't previously occurred?
  • What didn't happen that would usually have happened? (e.g. maybe a person was absent whose actions would normally have prevented the problem)
  • Was there a catalyst (i.e. something that either promoted or inhibited what occurred while not being directly involved)
  • What near misses previously occurred?
  • What did we do about them, if anything?
  • How much of this was wishful thinking? For instance, did we put controls in place when we really had no idea of what the causes were in the hope that doing something was better than doing nothing?
But essentially, we need to move away from a simplistic:
A caused H
towards a more nuanced understanding of causality:

{A,B and C} plus {the presence of D} plus {the absence of E,F and G} caused H.
It is only when we can understand a situation in this way that we can effectively deal with it and deal with its deepest roots rather than its superficial symptoms.

One of the most difficult aspects of this is recognising what didn't happen. This will be the subject of Part 2 of this article.


  • A film worth watching to see how complex and difficult to trace causality can be is the French film Happenstance where the circumstances under which two of the protagonists meet at the end of the film are the result of a complex chain of random events.
  • There are many many episodes of the TV series Seinfeld where a final outcome occurs as a result of a convergence of unrelated events in different characters lives. (e.g the episode where Kramer hitting golf balls out to sea results in George's deception about being a marine biologist being exposed.)
  • Also of interest: "Accidents at sea: Multiple Causes and Impossible Consequences

Sunday, January 1, 2012

Some Lessons from Simpson's Paradox

Simpson's Paradox is a statistical paradox which at first seems counter-intuitive. Basically, under certain conditions, Person A might achieve a better result than Person B on two different tasks, but when the results are added Person B may achieve a better overall result. This is a result that has been found in fields as diverse as medical research, anti-discrimination research and baseball batting averages.

For a numerical example consider the following:
Person A:   Task 1: 64/80 (i.e. 80%)   Task 2: 19/20 (95%)   Overall score: 83/100
Person B:   Task 1: 14/20 (i.e. 70%)   Task 2: 72/80 (90%)   Overall score: 86/100
So although Person A performs better on both tasks, Person B is the overall winner.

So how is this possible? If you think about it, the answer is simple: Person B invested most of their efforts in the task that they were best at, whereas Person A invested most of their efforts in the task they were worst at. It illustrates the old saying that what you lose on the swings, you win on the roundabouts.

There are two lessons we can learn from this that can be applied in business.

Firstly, when comparing the performance of two people, it may not be enough to simply consider their aggregate score. If for example, Task 1 is far more important to your business than Task 2, then the score on Task 1 should be treated as of greater significance than the aggregate result. Similarly, a worker may appear to have a better overall performance (e.g. make fewer mistakes) simply because they are tackling less of the harder work.

Secondly, if you want to compete effectively against someone who is better than you, then the best way is to focus most of your efforts in your strongest area. Trying to be an all rounder may yield a lower aggregate performance than playing to your strengths

Note that the paradox is only likely to emerge if the samples are of different sizes for each dimension. If we change the above example so that each person has the same sample size within each task, we get the following:
Person A:   Task 1: 64/80 (i.e. 80%)   Task 2: 19/20 (95%)   Overall score: 83/100
Person B:   Task 1: 56/80 (i.e. 70%)   Task 2: 18/20 (90%)   Overall score: 76/100
So now Person A is the overall winner. Even though the proportions are identical between the two examples, a simple change in sample size resulted in a different aggregate outcome.(In this case, both people are investing most of their efforts in their weaker area)

The overall lesson here is that when comparing performance on multiple dimensions, it may pay to drill down to individual components of that performance rather than simply looking at an aggregate figure, because this may give you a clearer picture of what a person is strong in, or conversely, where they are investing their efforts.