Monday, 22 November 2010

Why Business Insight Hurts The Head

I have a colleague who has a PhD in Algebraic Topology. He’s working on Customer Insight, which is the fancy name companies give to crawling through masses of data trying to figure out how to squeeze and extra buck out of people. The other day I asked him, by way of idle chatter, if he ever wondered why it was that he could calculate homology groups in his head but found the insight work as hurtful to the head, if not more so. Somehow it shouldn't be harder to make decent progress getting an answer from a database than with a long exact sequence.

It’s the details, we agreed. Abstract topological spaces tend to be fairly simple, because they are a) infinite and b) smooth. It’s the finite world that’s tricky: think of the thousands of pages of alleged proof of the classification of finite groups. Infinite groups are pretty much a doddle by comparison. Or classifying manifolds: it’s only difficult in four dimensions, because there’s a trick that makes it simple that works in five or more.

Business analysis is more like combinatorics. It’s finite, with a ton of fiddly details. If you’re an analyst and you don’t have a report with something like “Assign a sale to the X channel unless it’s Thursday and the application came through the internet from Scotland, when you should assign it the Y channel unless it’s for a gadget not a widget” – if you don’t have something like that, your Sales people aren’t trying hard enough. Or you have really good data stewardship.

In business we deal with a large number of tables with a large number of fields each with its own often idiosyncratic definition. Getting data for a business problem goes through the following stages:

Can we translate the problem into data we have?
If not exactly, how inexactly? How good do we think the surrogate variables are?
Can we get data for a simpler version of the problem that still gives us a good decision?
Where do I get that data?
How reliable is it?
How do I link all those tables together?
How do I get the records I want out of that monster?
What bit of syntax have I got wrong this time?
What do you mean “fieldname sally does not exist in object fred?”

And let’s not even get into incompatible date formats, converting data types, converting one set of indicator values to another, using case statements to define groupings and the fiddly syntax of SQL / SAS.

It’s way easier to calculate the homology group of the direct product of a torus with a Klein bottle.

No comments:

Post a Comment