Thursday, 11 August 2016
Monday, 8 August 2016
An Introduction to Andrew Gelman's Garden of Forking Paths
The Garden of Forking Paths is an idea introduced in a paper by Andrew Gelman and Eric Lokin that should be understood by everyone who uses statistics and analyses data.
Context for those unfamiliar with statistics. For a long time, and in many journals even now, research would only get published if it was “statistically significant”, which usually meant that the result had a p value of less than 5% (a figure chosen arbitrarily). The p-statistic can be calculated from the data and a hypothesis about the distribution of the data. This gave rise to the practice of “p-hacking” or “fishing” – looking through data, excluding this and grouping that, recalculating the p-statistic, until one found a result that had p < 5%, which they then published. Many of these results turned out to be un-reproducible by other researchers.
In the old-school approach, a researcher is supposed to formulate an hypothesis, and run an experiment to test it. If the results of the experiment are insufficiently probable under the hypothesis, the hypothesis has to be rejected. What counts (classically) as "insufficiently probable” is a value of the p-statistic greater than 5%. What you’re not allowed to do is throw away data you don’t like and change the hypothesis to suit the data that’s left. That’s downright dishonest. You have to take all the data, and there are complicated rules about what to do when subjects drop out of the study and other such eventualities. This is how the old-school founders worked. Much of their work was in agriculture and industry, and R A Fisher really did divide his plot of land on an agricultural research station, treat each patch of soil, plant the potatoes and stand back to see what happened. He had no previous theories, and if he did, the potatoes would decide which one was better.
In epidemiology, political science, social science, longditudinal health and lifestyle tracking surveys and other subjects, the experiments are not as simple nor as immediately relevant, and may even not be possible to conduct. The procedure is often reversed: the data appears first, and the hypotheses and statistical analysis are done afterwards. This is how businessmen read their monthly accounts and sales reports. Often those businessmen are expecting to see certain changes or figures, and when they don’t, want to know why (“We doubled advertising in Cornwall, why haven’t the sales increased? What are they playing at down there?”). Researchers in social sciences and epidemiology also come bristling with pet theories, some of which they are obliged to adopt by the prevailing academic mores.
Under these circumstances, the data is scanned by very practised eyes for patterns and trends that the readers expect to find. If there seem to be no such patterns, those same eyes will look a little harder to find places where they can see the patterns they want, or at least some patterns that make sense of the lack of expected results. Researchers looking at diet know but cannot say that the less educated are less healthy and eat worse food, because they cannot afford better. So the researchers scan the data and blame bacon and eggs, or whatever else is believed to be eaten by the lower classes. This saves the researchers' grants and jobs.
However, the next survey fails to find that eating bacon and eggs did not alter the health of the people who ate it. Though nobody will ever know, this is because, in the first sample, the people who ate bacon and eggs were mostly older unemployed English people who did not exercise, whereas in the second survey, they were mostly Romanian builders in their late twenties who also played football at the weekends.
What happens in this practised data scanning? It is a series of decisions to select these data points, and group those properties, and maybe construct a joint index of this and that variable. It may include comparing the usual summary statistics, looking at histograms, time series, scatter graphs and linear regressions, and maybe even running a quick-and-dirty logistic regression, GLM or cluster analysis. All this can be done in SAS or R, and much of it in Excel, in a few moments by a reasonable analyst. Speaking from experience, it does not feel any more sophisticated than looking at the raw numbers, and so, because familiarity breeds neutrality all this is seen as part of the “observation process” rather than the hypothesis-formation and testing process. (Methodological aside: Plenty of people still think that observation is a theory-free process that generates unambiguous “hard facts”, or that it is possible to have observations that may involve theories but are still neutral between the theories being tested, and so “relative hard facts”. The word has not got out far enough.)
These decisions about data choice and variable definition are what Gelman and Lokin call the “Garden of Forking Paths”. Their point is that to get the bad result about bacon-and-eggs we took one path, but we could have taken another and not found any result at all. And if we used all the data, we would have found nothing. The error is to present the result of the data-scanning, the walk down the Forking Path, as if the whole survey provided the evidence for it, instead of a very restricted subset of the data chosen to provide exactly that result.
The Forking Paths we take through the Garden of Data in effect create idiosyncratic populations that would never be used in a classical test, or which are so specialised that it is impossible to carry over the result to the general population. The decisions that are made almost unconsciously in that practised data scanning seem to produce evidence for a conclusion, but the probability of obtaining that evidence again is minimal. That is the key point. When the old-school statisticians did their experiments on potatoes, they could be fairly sure, based on what they knew about soil and potatoes, that the exact patch of ground they chose would not matter. Another patch would yield different results, but within the expected variations. The probability that their results would be reproducible was high. When researchers walk along a Forking Path, they risk losing reproducibility and therefore a broader relevance.
That’s why so many attention-grabbing results are never reproduced: because the evidence lying at the end of the Forking Path was itself improbable. Nobody cheated overtly, they just chose what made a nice story but didn’t then check on the probability of the evidence itself. Practised data scanning, or a good stroll through the Garden of Forking Paths, can give you a good value for
P(Nice_Story | Evidence), but P(Evidence) can be almost zero, and so the P(Nice_Story) = P(Nice story | Evidence)*P(Evidence) is also nearly zero and Nice_Story, really is just a fiction.
The difference between outright p-hacking and practiced data scanning is subtle, but it is politically important. p-hacking is clearly dishonest, and heaven forbid pharmaceutical companies should do it. Forking Paths is just, well, an understandable temptation. Gelman and Lokin stress how natural a temptation it is, as if to excuse it, but of course, if it is a natural temptation, the Virtuous Analyst will take care to resist it.
What Virtuous Analysts want to know is: how does one take a pre-existing data set and avoid the Garden of Forking Paths? Isn’t that an analyst’s job? Isn’t that why businesses have all that data? Because in amongst all that dross is the gold that will double sales and profits overnight? So suppose as a result of a thorough stroll round the Garden, I find what my manager wants to hear: that when sales of product A increase, sales of product B decrease. Product B, of course, is his, and product A belongs to a rival in the same organisation. This result holds only during periods of specific staff incentives in larger stores and not during the school holidays, and that makes up 65% of the sales during those periods. Everywhere else during those times, there is no relationship, and in the small stores at all times there is no relationship. That’s what I tell my manager, with all the caveats. It’s his decision whether to simplify it for the higher-ups. The Virtuous Analyst does not anticipate political or commercial decisions, but leaves that to the politicians and commercial managers.
Virtue sometimes hangs on a nuance.
Context for those unfamiliar with statistics. For a long time, and in many journals even now, research would only get published if it was “statistically significant”, which usually meant that the result had a p value of less than 5% (a figure chosen arbitrarily). The p-statistic can be calculated from the data and a hypothesis about the distribution of the data. This gave rise to the practice of “p-hacking” or “fishing” – looking through data, excluding this and grouping that, recalculating the p-statistic, until one found a result that had p < 5%, which they then published. Many of these results turned out to be un-reproducible by other researchers.
In the old-school approach, a researcher is supposed to formulate an hypothesis, and run an experiment to test it. If the results of the experiment are insufficiently probable under the hypothesis, the hypothesis has to be rejected. What counts (classically) as "insufficiently probable” is a value of the p-statistic greater than 5%. What you’re not allowed to do is throw away data you don’t like and change the hypothesis to suit the data that’s left. That’s downright dishonest. You have to take all the data, and there are complicated rules about what to do when subjects drop out of the study and other such eventualities. This is how the old-school founders worked. Much of their work was in agriculture and industry, and R A Fisher really did divide his plot of land on an agricultural research station, treat each patch of soil, plant the potatoes and stand back to see what happened. He had no previous theories, and if he did, the potatoes would decide which one was better.
In epidemiology, political science, social science, longditudinal health and lifestyle tracking surveys and other subjects, the experiments are not as simple nor as immediately relevant, and may even not be possible to conduct. The procedure is often reversed: the data appears first, and the hypotheses and statistical analysis are done afterwards. This is how businessmen read their monthly accounts and sales reports. Often those businessmen are expecting to see certain changes or figures, and when they don’t, want to know why (“We doubled advertising in Cornwall, why haven’t the sales increased? What are they playing at down there?”). Researchers in social sciences and epidemiology also come bristling with pet theories, some of which they are obliged to adopt by the prevailing academic mores.
Under these circumstances, the data is scanned by very practised eyes for patterns and trends that the readers expect to find. If there seem to be no such patterns, those same eyes will look a little harder to find places where they can see the patterns they want, or at least some patterns that make sense of the lack of expected results. Researchers looking at diet know but cannot say that the less educated are less healthy and eat worse food, because they cannot afford better. So the researchers scan the data and blame bacon and eggs, or whatever else is believed to be eaten by the lower classes. This saves the researchers' grants and jobs.
However, the next survey fails to find that eating bacon and eggs did not alter the health of the people who ate it. Though nobody will ever know, this is because, in the first sample, the people who ate bacon and eggs were mostly older unemployed English people who did not exercise, whereas in the second survey, they were mostly Romanian builders in their late twenties who also played football at the weekends.
What happens in this practised data scanning? It is a series of decisions to select these data points, and group those properties, and maybe construct a joint index of this and that variable. It may include comparing the usual summary statistics, looking at histograms, time series, scatter graphs and linear regressions, and maybe even running a quick-and-dirty logistic regression, GLM or cluster analysis. All this can be done in SAS or R, and much of it in Excel, in a few moments by a reasonable analyst. Speaking from experience, it does not feel any more sophisticated than looking at the raw numbers, and so, because familiarity breeds neutrality all this is seen as part of the “observation process” rather than the hypothesis-formation and testing process. (Methodological aside: Plenty of people still think that observation is a theory-free process that generates unambiguous “hard facts”, or that it is possible to have observations that may involve theories but are still neutral between the theories being tested, and so “relative hard facts”. The word has not got out far enough.)
These decisions about data choice and variable definition are what Gelman and Lokin call the “Garden of Forking Paths”. Their point is that to get the bad result about bacon-and-eggs we took one path, but we could have taken another and not found any result at all. And if we used all the data, we would have found nothing. The error is to present the result of the data-scanning, the walk down the Forking Path, as if the whole survey provided the evidence for it, instead of a very restricted subset of the data chosen to provide exactly that result.
The Forking Paths we take through the Garden of Data in effect create idiosyncratic populations that would never be used in a classical test, or which are so specialised that it is impossible to carry over the result to the general population. The decisions that are made almost unconsciously in that practised data scanning seem to produce evidence for a conclusion, but the probability of obtaining that evidence again is minimal. That is the key point. When the old-school statisticians did their experiments on potatoes, they could be fairly sure, based on what they knew about soil and potatoes, that the exact patch of ground they chose would not matter. Another patch would yield different results, but within the expected variations. The probability that their results would be reproducible was high. When researchers walk along a Forking Path, they risk losing reproducibility and therefore a broader relevance.
That’s why so many attention-grabbing results are never reproduced: because the evidence lying at the end of the Forking Path was itself improbable. Nobody cheated overtly, they just chose what made a nice story but didn’t then check on the probability of the evidence itself. Practised data scanning, or a good stroll through the Garden of Forking Paths, can give you a good value for
P(Nice_Story | Evidence), but P(Evidence) can be almost zero, and so the P(Nice_Story) = P(Nice story | Evidence)*P(Evidence) is also nearly zero and Nice_Story, really is just a fiction.
The difference between outright p-hacking and practiced data scanning is subtle, but it is politically important. p-hacking is clearly dishonest, and heaven forbid pharmaceutical companies should do it. Forking Paths is just, well, an understandable temptation. Gelman and Lokin stress how natural a temptation it is, as if to excuse it, but of course, if it is a natural temptation, the Virtuous Analyst will take care to resist it.
What Virtuous Analysts want to know is: how does one take a pre-existing data set and avoid the Garden of Forking Paths? Isn’t that an analyst’s job? Isn’t that why businesses have all that data? Because in amongst all that dross is the gold that will double sales and profits overnight? So suppose as a result of a thorough stroll round the Garden, I find what my manager wants to hear: that when sales of product A increase, sales of product B decrease. Product B, of course, is his, and product A belongs to a rival in the same organisation. This result holds only during periods of specific staff incentives in larger stores and not during the school holidays, and that makes up 65% of the sales during those periods. Everywhere else during those times, there is no relationship, and in the small stores at all times there is no relationship. That’s what I tell my manager, with all the caveats. It’s his decision whether to simplify it for the higher-ups. The Virtuous Analyst does not anticipate political or commercial decisions, but leaves that to the politicians and commercial managers.
Virtue sometimes hangs on a nuance.
Labels:
Business
Thursday, 4 August 2016
Monday, 1 August 2016
The Vanishing Abandoned Citroen
Informing the council about a nuisance never does any good. They need you to establish a “history” of the bad activity and that can take months. I called the animal people out one evening many years ago because of a dog that had been left alone in a house two doors up and would bark non-stop all night. Two people turned up and promptly said that they were not going to knock on the door as the dog was clearly dangerous and would I keep records of how often this happened so they could talk to the occupants. Gee thanks. I’m paying taxes to feed and house you guys?
Anyway, the other week I got fed up of the heap in the photographs bringing the tone of my street down and more to the point taking up a parking space, so I trotted out, took some photographs, and filed an online report with the Council. I expected to hear nothing, or possibly to be told I would be arrested for a hate crime as the owner was Diverse, or something. Indeed, the next day, a Man From The Council called me and said mine wasn’t the first complaint about the car, he had finally located the owner who lived locally and had dropped a card through his door. Abandoned cars mostly get crushed, so I get that the Council doesn’t want to be hasty, but my caller was talking about “building up a history” and when bureaucrats do that, I assume nothing will happen for at least a year.
The next evening….. it was gone. The owner must have moved it. God alone knows what value they attached to it, but clearly it as enough to make them move it before the Council crushed it.
Miracles.
Anyway, the other week I got fed up of the heap in the photographs bringing the tone of my street down and more to the point taking up a parking space, so I trotted out, took some photographs, and filed an online report with the Council. I expected to hear nothing, or possibly to be told I would be arrested for a hate crime as the owner was Diverse, or something. Indeed, the next day, a Man From The Council called me and said mine wasn’t the first complaint about the car, he had finally located the owner who lived locally and had dropped a card through his door. Abandoned cars mostly get crushed, so I get that the Council doesn’t want to be hasty, but my caller was talking about “building up a history” and when bureaucrats do that, I assume nothing will happen for at least a year.
The next evening….. it was gone. The owner must have moved it. God alone knows what value they attached to it, but clearly it as enough to make them move it before the Council crushed it.
Miracles.
Labels:
Society/Media
Thursday, 28 July 2016
Castro, ISIS and Weaponising Fake Refugees
Can you keep count of how many young Arab men are killing Europeans? I can’t. And more and more they aren’t suicide bombers, but berserkers. Driving a truck along a crowded Promenade des Anglais is not a political act, but the act of a mentally unstable person off his head on drugs. The same with axe-killers. These people are crazy, as in, psychiatrically crazy. Where have we seen this before?
In the 1980’s and 1990’s Cuba generated hundreds and then thousands of “refugees”, all of whom gained sympathy from well-meaning liberals who would never have to live next to one of them. It didn’t take the US long to realise that Castro had been clearing out his jails, dumping HIV carriers and packing off homosexuals and other people he didn’t want. In addition, Cuba was short on food, so letting a few thousand people go not only cured some political and social-order problems, it exported Castro’s food shortage problem as well.
Oh. Wait. Where else has had terrible harvests for the last few years? That would be Syria. And probably any other Arab country be-devilled by civil war and insurrection. Got a problem feeding the people in your Caliphate? Take a lesson from Castro.
I’m betting that ISIS, the Taliban and others emptied out the jails in Syria, Iraq, Afghanistan and anywhere else they took over. And not just the jails but the mental hospitals as well. After all, what’s more dangerous than a bomb you know when it’s going to go off? A bomb you don’t know when it’s going to go off. They sent large numbers of hungry, and therefore angry, young men off as well. So they exported their food problem, and their social-order problem.
Everyone at the time remarked on how strange it was that the “refugees” from the Arab / Muslim countries seemed to be vigorous young men who were strangely well-informed about where to go and what to do on arrival. Those vigorous young “refugees” groping Europe’s daughters are uneducated farm-boys, so where did they get the money to pay for the people-smuggling bit of the journey? Um. Maybe they didn’t. Maybe the smugglers got paid by the boatload by ISIS, and everyone was told to say they had to pay the people-smugglers, because the dumb Europeans would believe that and give them lots of money.
The equally dumb German politicians thought they were getting the fleeing Syrian middle-class on the cheap. No so much. They were getting the low-skilled, the insane, the criminal and probably the AIDS carriers as well. If that’s not an invasion, I don’t know what is.
Look for a wave of "political refugees" from Turkey as Erdogan empties his jails, addicts and asylums into the open arms of Germany. He will dump them off the shores of Greece for sure, and the Greeks will fire them at the Germans.
In the 1980’s and 1990’s Cuba generated hundreds and then thousands of “refugees”, all of whom gained sympathy from well-meaning liberals who would never have to live next to one of them. It didn’t take the US long to realise that Castro had been clearing out his jails, dumping HIV carriers and packing off homosexuals and other people he didn’t want. In addition, Cuba was short on food, so letting a few thousand people go not only cured some political and social-order problems, it exported Castro’s food shortage problem as well.
Oh. Wait. Where else has had terrible harvests for the last few years? That would be Syria. And probably any other Arab country be-devilled by civil war and insurrection. Got a problem feeding the people in your Caliphate? Take a lesson from Castro.
I’m betting that ISIS, the Taliban and others emptied out the jails in Syria, Iraq, Afghanistan and anywhere else they took over. And not just the jails but the mental hospitals as well. After all, what’s more dangerous than a bomb you know when it’s going to go off? A bomb you don’t know when it’s going to go off. They sent large numbers of hungry, and therefore angry, young men off as well. So they exported their food problem, and their social-order problem.
Everyone at the time remarked on how strange it was that the “refugees” from the Arab / Muslim countries seemed to be vigorous young men who were strangely well-informed about where to go and what to do on arrival. Those vigorous young “refugees” groping Europe’s daughters are uneducated farm-boys, so where did they get the money to pay for the people-smuggling bit of the journey? Um. Maybe they didn’t. Maybe the smugglers got paid by the boatload by ISIS, and everyone was told to say they had to pay the people-smugglers, because the dumb Europeans would believe that and give them lots of money.
The equally dumb German politicians thought they were getting the fleeing Syrian middle-class on the cheap. No so much. They were getting the low-skilled, the insane, the criminal and probably the AIDS carriers as well. If that’s not an invasion, I don’t know what is.
Look for a wave of "political refugees" from Turkey as Erdogan empties his jails, addicts and asylums into the open arms of Germany. He will dump them off the shores of Greece for sure, and the Greeks will fire them at the Germans.
Labels:
Brexit
Monday, 25 July 2016
Subscribe to:
Posts (Atom)