Wednesday, 1 July 2009

Gruesome Doubts and Testing

It's fairly easy to test that something does what it's supposed to do. That's what specifications and user acceptance testing is for. Let's call a requirement locally positive if it states that under conditions X, Y, Z the widget will do W within T seconds (or after N repetitions). Locally positive requirements are obtained by a demonstration: here it is doing it. An indefinite positive requirement, is one that asks that the widget does whatever it is at some unspecified time in the future. Similarly, a requirement is locally negative if it demands the widget doesn't do something within T seconds; it is indefinite negative if it demands the widget never does it. A locally negative requirement can be demonstrated: we set up the conditions, start up the widget and wait T seconds or N repetitions. No puff of smoke and it passes. How do we prove an unrestricted negative or an unrestricted positive? We can't, we don't have enough time. No amount of evidence will prove it, because tomorrow something might go wrong or it might go right. And since tomorrow never comes...

Now let's look at some doubts. A doubt is specific if it is about the widget's ability to fulfil a locally positive or negative requirement. Specific doubts are testable. A doubt is gruesome if it claims that under some as yet unknown set of circumstances, the widget will fail to do what we would want it to do if we knew the circumstances.

Why “gruesome”? The philosopher Nelson Goodman invented a predicate “grue”. An object is grue if it is green up to some date and blue afterwards. An emerald might be grue. His point was that if the date is far enough into the future, any evidence that the object was green and would not turn colour on the given date was also evidence that it was grue and would turn colour on the given date. You object to the idea of “grue” as a colour only if you've forgotten that many leaves are “gred”: green before autumn, and red during autumn. Gruesome doubts amount to the claim that the widget is reliable up to some unspecified time or event in the future and unreliable at or after that point. Or that it is reliable for a wide range of inputs but not for an as yet unspecified but suspected set of inputs. The point is, of course, that any evidence that the widget is reliable is also evidence that it is gruesome-ly unreliable.

Here's the point. No evidence can satisfy anyone with a gruesome doubt – and there is no point in trying. Gruesome doubts can, if expressed in the right way give you a reputation for wise caution: “let's keep an eye on it”, you say, or, “we should monitor it for any anomalies”, or “the tests seem to indicate that it is working, but I'd like to run some more later”. Gruesome doubts can also be used to bully people: “how do I know it will work with any bit of data / tomorrow / next week?” Don't use these arguments because they won't listen. They are out to make everyone involved look as if they haven't done a good job of testing and make them look bad and feel crazy. Why is she asking this when there is no answer? If you're in this position, philosophy is likely to be little consolation.

No comments:

Post a Comment