Tuesday 26 May 2009

The Philosophy of Mistakes: Interlude

I'm going round and round in circles on the subjects of mistakes and testing. As ever, it's because I'm arguing with a voice in my head that doesn't belong there. It belongs to a previous manager I had who had a very low level of technical skills and knowledge and paranoia way into the red zone. What he wanted was the assurance that nothing would ever come out of our pricing engine that would ever cause him embarrassment. Nothing. Ever. But it all had to be black-box testing, because if he (and others in the team) looked inside the box, they would not really understand what they were looking at.

Now everything I have read on software testing says that the vast majority of mistakes and omissions are found by a thorough code review conducted by two relevantly skilled people. Black box testing is a very bad way of finding mistakes. Why? Because you don't need many input variables with a reasonable range of values before a “perm all the values” test suite becomes a billion or more records long. You can't construct it, let alone have the time to run it. So you have to run a subset – a very small subset of maybe a hundred thousand records. What are the odds that a random flaw will be in that subset? That would be less than 0.01%. This is not going to work. Plus, you have to construct the test cases, which means you need an engine to produce the “answers” for the cases – in software, since no human being is going to be able to work out the answers to more than about a hundred cases accurately – so you get stuck in an infinite regress of checking the checking.

What, in other words, my paranoid manager needed to do was sit with the programmers and get a detailed walk-through of the code behind the pricing engine and how well they had foreseen problems and dealt with them. He would then have an idea of where specific problems might arise and what he should be testing for. But no, that was never going to happen.

Testing goes back to the days when people tried to pass off low-grade precious metal currency on each other – so you needed to test that you really were getting a genuine shekel or twenty-carat gold or whatever. Testing for that is simple. So is testing that the latest batch of cannon aren't going to blow up all over your soldiers – that's why it's called proof-firing (on the Abderdeen Proving Grounds). Any gun that survives a dozen shots is unlikely to blow up later – it's in the nature of the beast. As the system gets more complicated, so does the testing. In the end, you have to accept that if you're going to have anything as complicated as a nuclear submarine or a Boeing 747, some things are going to be less perfectly finished than others and it may even sail or fly with a couple of non life-threatening snags. That's the nature of the real world. The same goes for software – though the desired standard is more XP SP1 than Windows Vista. Perfect, error-free programs are either very few if complex, or very small if plentiful. What we get in ordinary business applications is going to be well short of perfection, unless it's fairly simple.

Testing is and always was there to prove that under normal circumstances your widget works, not that under some weird conditions it doesn't. If it's a mechanical widget, those weird circumstances may not be too hard to find – too hot, too cold, too much dust, but who foresaw “the wrong kind of snow”? If it's a software widget, those weird circumstances might be there, but may never occur. Software isn't like a fugaze coin – which shows up bad when it tries to be good – rather, software can work just fine, except when...

And no amount of black box testing will ever find it. What you need is sensible peer-review. Like that's going to happen in a modern organisation where everyone is a single point of failure.

No comments:

Post a Comment