Over Easter I have been reading a short fun book on statistics by Uri Bram, called Thinking Statistically. This trots through some of the obvious pitfalls in reasoning that statistical reasoning can help you avoid, including the absolute classic: selection bias. I have also been listening to news and phone-in radio programs about the benefit changes that have come in today. This debate contains some truly egregious examples of selection bias (egregious is my big word for today, it means outstandingly bad).
The government’s estimate for the number of households effected by the change in housing benefit to penalise those with excess rooms is 660,000. The number of people affected will be larger of course. Now this is a large number and it is not practical for someone to consider 660,000 cases before they make up their mind as to whether they agree or disagree with this policy.
No problem, over the last two-hundred-ish years statisticians and scientists have come across this sort of problem many times, and have a solution. It is called sampling. The basic idea is that if you want a rough idea (which should be enough here) of what will happen to 660,000 households, then all you need to do is take a handful of randomly selected households, and look at the effect in those cases.
With a high probability, this will give you a good idea of what the effect will be on most of the 660,000 households. Then you can decide to be for or against the policy. The UK is a democracy, those of us over 18 should try and understand our government’s policy so we can make an informed choice in the 2015 general selection.
However, the problem is that the handful of households really do need to be randomly selected, i.e., picked from the 660,000 at random, without any bias. Many of the examples in the news media do not look to be selected at random, e.g., the mother of 11 in The Sun. If you select a household at random, the odds are very long that it is headed by a mother of 11. It looks like The Sun is engaging in very strong selection bias, i.e., selecting not at random but deliberately to make a point.
With sufficient selection bias I can ‘prove’ anything I like, e.g., by taking Bill Gates as my sample, I can ‘prove’ that we have an average wealth of $67 billion, and so could easily shrink the deficit by just donating a thousand pounds each – loose change to us billionaires.
I can’t select from the 660,000 at random but after a quick look at some government stats, it looks like the two most common household types in this 660,000, are: single people living in two bedroom homes, and single-parent families, both with a single excess room. In both cases they will lose £10 to £15 per week. The stats are rather incomplete and don’t say how much they are getting per week before the cuts, but the average housing benefit across all households is £90 per week.
So, my best guess is that if you picked a couple of households at random it is likely you’d get say one household that is someone living on their own whose benefit is cut from perhaps £70 to £80 per week, by around £10 per week, and a second household that is a single parent and two kids, whose benefit of around £100 per week is cut by £15 per week.
Basing a decision on a policy based on just two examples is not unreasonable – we can’t spend hours assessing every policy – but the examples have to be chosen randomly. This is difficult, as the examples we are being presented with are absolutely deliberately being selected in a very biased way. I think this is bad for British democracy.