Thursday 8 August 2013

Simple, Fast, Insightful. Pick Two.

There's a really neat presentation by the man with the coolest job in IT-town ("I'm Ira Hunt, I'm CTO of the CIA" and he says it real fast). It has many interesting points that shows he gets the whole Big Data thing, and yet, and yet... one of the things he wants is to reduce the dependance on expensive data scientists who are in short supply, and produce a piece of kit that lets regular analysts with degrees in History and Politics from Georgetown search their way through the databases.

Now where have I heard that before? How many times have I heard that before? And why does it never work? Well, here's how the Miracle Information System gets demo'd: "Let's say you want to look at all the e-mails sent by people who are one link removed from Hamid al Hamid to people who looked at the al Jazeera video on You Tube about the bombing in Katusk-al-Katusk and then Facebooked a Like? So all you'd do is (makes a few mouse movements) and there's your answer." Applause, coo-ing and million-dollar orders swiftly follow.

Except. How did the analyst know that those e-mails were there? How did they know how to click that box, then drag that? How did they work out the boolean conditions required for the search? How did Hamid-al-Hamid's name wind up on a drop-down menu? And so on and so forth. Of course when they get the system, the regular analysts won't learn all this stuff, and there will be a handful of guys who do, and they will have dead-end guru jobs for a decade being whizzes on the system.

To handle a data set, however small or large, you need a picture of it, and the way it gets fed from the outside world, and the hierarchy of tables within it, in your head. To be any good at all with it, you need to know the names of all the major identifying variables and categories (you'll need to tell the computer it's Hamid-al-aHamid the Iraqi terrorist you want, not Hamid-al-Hamid the second-generation US citizen and Queens Halal shop owner). That's often a task of scholarship in itself.

Even if you use a GUI to design the query and cut the SQL (or whatever) for you, you still need to know about join types, boolean operator precedence and lexicographical ordering. Nope, right there, that's lost everyone who doesn't have a STEM background. Seriously. Join types, operator precedence and lexicographical ordering. That's all it takes to stump ninety-five per cent of the population. FOR LIFE. (I am the only person on a floor of one hundred analytical people, including many well-paid SAS analysts who knows what operator precedence is. Everyone else instinctively keeps what they do simple enough so they don't have to. So that affects how insightful and complex the work is.)

There are only so many people who can do that, just as there are only so many people who learn the contents of Grey's Anatomy and a zillion other disconnected facts to become some kind of medic, or who can learn the endless VAT statutes and rulings. No-one suggests replacing surgeons with a nurse and a GUI, and everyone has given up trying to develop "expert systems" for tax legislation, so why do IT guys keep trying to get rid of their surgeon-equivalents, the data scientists (or whatever they get called these days)?

They don't of course, but they have software to sell, or buy, and projects to run, and promises to make, so they pretend that, yes, you can exploit data as complex as the CIA and NSA has with a neat GUI and a joint honours degree in International Relations and Farsi (I have nothing but respect for people who can learn Farsi or any other non-native language, it's just that it won't help you design the query you want.) No. You really can't. And unless you put the design of the databases in the hands of people who have an end-to-end appreciation of the issues, you will wind up with some contractor encoding everything in sight without asking anyone who will actually use the data, and then refusing to change anything because you can't demonstrate a business case for doing so. Maybe the CIA don't have that problem. Maybe they can just kill DBA's and Sysadmins who won't do as they're told. (Do you think so? Can I work there if they can?)

Nah. Until we can, we are all safe from Big Brother, because Big Bro simply doesn't have the technical chops. Actually, nobody does.

Simple, fast, insightful. Pick two.

No comments:

Post a Comment