Monday, 29 August 2016

Basquiat Becomes One Of My Things

I saw Julian Schnabel’s movie Basquiat when it came out in the UK, and it lifted my spirits something wonderful. I was intrigued by the Basquiat’s art - although the paintings in the movie are by Schnabel, Jeffery Wright and other assistants.  For a long time Basquiat was on my list of “Good Artists Whose Work I Wouldn’t Want On My Walls”.
(Jeffrey Wright actually painting a school-of-Basquiat.)

There was an article on Marion Maneker’s Art Market blog earlier this year about the fact that, though Basquiats do well with collectors and at auction, art museums don’t have many if any. There are none in Tate Modern. I had the answer as soon as I finished reading the sentence.

Imagine a Basquiat next to all those paintings in Tate Modern: it would simply drown them out. You would realise how damned polite all those Surrealists and whatever all else’s are. From memory of the collection there, only the Rothkos could stand up to the competition.



This sent me back to looking at his paintings again. I bought the affordable and well-illustrated book from the exhibition at the Brooklyn Museum and turned the pages looking at the pictures. This time around I found myself thinking that I wouldn’t mind having some of these on the walls. Whatever it was I had seen but couldn’t respond to, I could no respond to. So a couple of weeks ago, I saw the larger Now’s The Time (published by Prestel and pretty darn affordable) in Waterstone’s Piccadilly and snapped it up. And a pleasure to look through it is.

This happens from time to time. Something I’ve dismissed, ignored or simply filed as “Not My Thing” suddenly becomes one of my things. Time. Experience. More reading, looking and listening. Remind me to tell you how I came to like, rather than simply admire Eric Rohmer’s movies.

Thursday, 25 August 2016

How Not To Tell A Story With Statistics

There was a recent study in the USA showing that 15% of people born between 1990-94 were still virgins, compared with 6% of those born between 1965 and 1969. Headline summary? "Millennials are having less sex."

Okay. We will pass over the difference between "More 24-year olds are virgins now than they were in 1982" (which is what the numbers say) and "The people who are having sex now have it less often than their parents did" (which is what the headlines said). Let's not complain about journalism.
Statistics creak and out come the freaks. All of them blaming their pet peeve about the world today. Everything from an increasing number of younger people living at home to low testosterone cause by oestrogen in the water supply. What's wrong with these explanations? The cause is too broad and the effect is too narrow. If Harry is affected by the oestrogen in the water so that he doesn't want to get laid, how come Chad still lost his virginity? As for living at home? Again, the numbers are too large and the effect is too small. It’s just silly.

Sadly, the same old stuff is trotted out by the authors of the original papers, and they are supposed to be smart academics who are on the ball with this stuff. They try to crack the nut of a small (absolute) change of a fringe behaviour (being a virgin at 24) with the hammers of nationwide trends.
What the survey says is that of the children born in 65-69, 95% of women had lost their virginity by 24, compared to 92% of men. Of those born in 90-94, 84% of women and 86% of men were no longer virgins. Most Americans have had sex at least once by the time they are 24, though it seems the late 1970’s and early 1980’s were prime sexy time.

The commentary in the analysis is opaque, and that’s being polite. They did an APC analysis, which stands for Age-Period-Cohort, and to keep a long story short, that should not fill you with warm fuzzy feelings of security. This gives us the two graphs below.



(The axis labelling on these graphs is sloppy. It says “Percentage” and then gives us numbers looking like 0.02. Is that 2% or 0.02%? If you think that’s picky, try taking a graph mis-labelled like that into a meeting with a sharp business manager. You may never be invited back. I’m going to assume they mean 2% when they put 0.02. Otherwise the effects are trivial.)

What these graphs show is never explained, and neither is the idea of a "moderator of the cohort effects” given in this splendid paragraph.
The increase in adult sexual inactivity between the 1960s and 1990s cohorts was larger and significant among women (from 2.3 to 5.4 %) but not among men (from 1.7 to 1.9 %). It was nonexistent among Black Americans (2.6–2.6 %, compared to a significant jump from 1.6 to 3.9 % among Whites). The increase in sexual inactivity was significant only among those without a college education (jumping from 1.7 to 4.1 %) and was nonexistent among those who attended college (2.2–2.2 %). The trend was largest and significant in the East (2–4.5 %), followed by the West (1.7–2.7 %) and Midwest (2.1–3.2 %, not significant), and nonexistent in the South (2.4–2.4 %). The increase was slightly larger and significant among those who attend religious services (2.3–4.3 %) than among those who do not (1.5–3 %, not significant). Many of the differences between groups in recent cohorts were also significant: For example, women were more likely to be sexually inactive compared to men, Whites more than Blacks, those who did not attend college more than those who did, and in the East more than in the West.
No. It’s not you. I do this stuff for a living and I have no idea what these numbers mean. I’m guessing that the percentages are added to some base number to get the virginity rate. For the 65-69-born women, that’s 3% (period) + 2% (cohort) + 1.7% (gender) = 6.7%, which is an overstatement, and for the 90-94-born women, that’s 3% (period) + 4% (cohort) + 5.4% (gender) = 12.4% which is an understatement, so maybe we have to add on other things. Or maybe it's multiplicative. I don’t know, and the authors don't explain how we should use all those numbers. As a result, the paper is useless to everyone. (The more I run across this kind of opacity, the more I appreciate the discipline of having to tell a story in business presentations.)

Let’s do some math. The sample size for the 90-94-born is 1,910 (291/0.152). The rate increase of 9% between the 60’s and 90’s cohorts makes 114 people, most of whom, according to this analysis, are white non-college women. The sample has 955 women (half of people are female) of whom 525 are white (55% of women in the USA are white) and 65% (in the USA), or 340, of whom are non-college-educated. If this was the 60’s cohort, that would be 20 virgins. Now there are 20+114 = 134 virgins and the rate amongst white non-college women has gone up nearly seven times to 33% in that segment, compared with 6% in the college-girl segment. That gives a blended average virginity rate of 27% for all white women 20-24. NATSAL-3 tells us that in the UK almost 20% of men and women were virgins at 24, and half of them went to university.

At this point I could start speculating as to what might be causing this frankly unbelievable proportion of American virgins. But I won't. I call sampling scheme problems. Or I call something wrong with the APC method. Or both.

And maybe the girls are lying. It’s just a thought. Because it never happens in other surveys.

Monday, 22 August 2016

Denmark Street


Denmark Street, home of guitars, keyboards, and all things needed to start your very own band. Ever since Crossrail started, there have been rumours that the music shops are going to be ejected and replaced by something less funky that pays more rent. This visit round, I'm sure there were more restaurants than before. The young lady who sold me a set of light gauge steel D'Addorio strings for my trusty acoustic told me that most of the musical retailers have twenty-year leases on their shops. Doesn't mean someone might not come along and give the leaseholders bucket-loads of cash to sell, but the point is she didn't say "Oh yes, we're living from month month, no-one knows."

Apparently people come from all over the world to the famous Denmark Street to look at and buy musical instruments. I've looked at one and off most of my life. To someone who has never seen it, it must look deliciously tatty and romantic. After all, what matters is the National Steel in the window, not the window.

Monday, 8 August 2016

An Introduction to Andrew Gelman's Garden of Forking Paths

The Garden of Forking Paths is an idea introduced in a paper by Andrew Gelman and Eric Lokin that should be understood by everyone who uses statistics and analyses data.

Context for those unfamiliar with statistics. For a long time, and in many journals even now, research would only get published if it was “statistically significant”, which usually meant that the result had a p value of less than 5% (a figure chosen arbitrarily). The p-statistic can be calculated from the data and a hypothesis about the distribution of the data. This gave rise to the practice of “p-hacking” or “fishing” – looking through data, excluding this and grouping that, recalculating the p-statistic, until one found a result that had < 5%, which they then published. Many of these results turned out to be un-reproducible by other researchers.

In the old-school approach, a researcher is supposed to formulate an hypothesis, and run an experiment to test it. If the results of the experiment are insufficiently probable under the hypothesis, the hypothesis has to be rejected. What counts (classically) as "insufficiently probable” is a value of the p-statistic greater than 5%. What you’re not allowed to do is throw away data you don’t like and change the hypothesis to suit the data that’s left. That’s downright dishonest. You have to take all the data, and there are complicated rules about what to do when subjects drop out of the study and other such eventualities. This is how the old-school founders worked. Much of their work was in agriculture and industry, and R A Fisher really did divide his plot of land on an agricultural research station, treat each patch of soil, plant the potatoes and stand back to see what happened. He had no previous theories, and if he did, the potatoes would decide which one was better.

In epidemiology, political science, social science, longditudinal health and lifestyle tracking surveys and other subjects, the experiments are not as simple nor as immediately relevant, and may even not be possible to conduct. The procedure is often reversed: the data appears first, and the hypotheses and statistical analysis are done afterwards. This is how businessmen read their monthly accounts and sales reports. Often those businessmen are expecting to see certain changes or figures, and when they don’t, want to know why (“We doubled advertising in Cornwall, why haven’t the sales increased? What are they playing at down there?”). Researchers in social sciences and epidemiology also come bristling with pet theories, some of which they are obliged to adopt by the prevailing academic mores.

Under these circumstances, the data is scanned by very practised eyes for patterns and trends that the readers expect to find. If there seem to be no such patterns, those same eyes will look a little harder to find places where they can see the patterns they want, or at least some patterns that make sense of the lack of expected results. Researchers looking at diet know but cannot say that the less educated are less healthy and eat worse food, because they cannot afford better. So the researchers scan the data and blame bacon and eggs, or whatever else is believed to be eaten by the lower classes. This saves the researchers' grants and jobs.

However, the next survey fails to find that eating bacon and eggs did not alter the health of the people who ate it. Though nobody will ever know, this is because, in the first sample, the people who ate bacon and eggs were mostly older unemployed English people who did not exercise, whereas in the second survey, they were mostly Romanian builders in their late twenties who also played football at the weekends.

What happens in this practised data scanning? It is a series of decisions to select these data points, and group those properties, and maybe construct a joint index of this and that variable. It may include comparing the usual summary statistics, looking at histograms, time series, scatter graphs and linear regressions, and maybe even running a quick-and-dirty logistic regression, GLM or cluster analysis. All this can be done in SAS or R, and much of it in Excel, in a few moments by a reasonable analyst. Speaking from experience, it does not feel any more sophisticated than looking at the raw numbers, and so, because familiarity breeds neutrality all this is seen as part of the “observation process” rather than the hypothesis-formation and testing process. (Methodological aside: Plenty of people still think that observation is a theory-free process that generates unambiguous “hard facts”, or that it is possible to have observations that may involve theories but are still neutral between the theories being tested, and so “relative hard facts”. The word has not got out far enough.)

These decisions about data choice and variable definition are what Gelman and Lokin call the “Garden of Forking Paths”. Their point is that to get the bad result about bacon-and-eggs we took one path, but we could have taken another and not found any result at all. And if we used all the data, we would have found nothing. The error is to present the result of the data-scanning, the walk down the Forking Path, as if the whole survey provided the evidence for it, instead of a very restricted subset of the data chosen to provide exactly that result.

The Forking Paths we take through the Garden of Data in effect create idiosyncratic populations that would never be used in a classical test, or which are so specialised that it is impossible to carry over the result to the general population. The decisions that are made almost unconsciously in that practised data scanning seem to produce evidence for a conclusion, but the probability of obtaining that evidence again is minimal. That is the key point. When the old-school statisticians did their experiments on potatoes, they could be fairly sure, based on what they knew about soil and potatoes, that the exact patch of ground they chose would not matter. Another patch would yield different results, but within the expected variations. The probability that their results would be reproducible was high. When researchers walk along a Forking Path, they risk losing reproducibility and therefore a broader relevance.

That’s why so many attention-grabbing results are never reproduced: because the evidence lying at the end of the Forking Path was itself improbable. Nobody cheated overtly, they just chose what made a nice story but didn’t then check on the probability of the evidence itself. Practised data scanning, or a good stroll through the Garden of Forking Paths, can give you a good value for
P(Nice_Story | Evidence), but P(Evidence) can be almost zero, and so the P(Nice_Story) =  P(Nice story | Evidence)*P(Evidence) is also nearly zero and Nice_Story, really is just a fiction.

The difference between outright p-hacking and practiced data scanning is subtle, but it is politically important. p-hacking is clearly dishonest, and heaven forbid pharmaceutical companies should do it. Forking Paths is just, well, an understandable temptation. Gelman and Lokin stress how natural a temptation it is, as if to excuse it, but of course, if it is a natural temptation, the Virtuous Analyst will take care to resist it.

What Virtuous Analysts want to know is: how does one take a pre-existing data set and avoid the Garden of Forking Paths? Isn’t that an analyst’s job? Isn’t that why businesses have all that data? Because in amongst all that dross is the gold that will double sales and profits overnight? So suppose as a result of a thorough stroll round the Garden, I find what my manager wants to hear: that when sales of product A increase, sales of product B decrease. Product B, of course, is his, and product A belongs to a rival in the same organisation. This result holds only during periods of specific staff incentives in larger stores and not during the school holidays, and that makes up 65% of the sales during those periods. Everywhere else during those times, there is no relationship, and in the small stores at all times there is no relationship. That’s what I tell my manager, with all the caveats. It’s his decision whether to simplify it for the higher-ups. The Virtuous Analyst does not anticipate political or commercial decisions, but leaves that to the politicians and commercial managers.

Virtue sometimes hangs on a nuance.