Thursday, 25 August 2016

How Not To Tell A Story With Statistics

There was a recent study in the USA showing that 15% of people born between 1990-94 were still virgins, compared with 6% of those born between 1965 and 1969. Headline summary? "Millennials are having less sex."

Okay. We will pass over the difference between "More 24-year olds are virgins now than they were in 1982" (which is what the numbers say) and "The people who are having sex now have it less often than their parents did" (which is what the headlines said). Let's not complain about journalism.
Statistics creak and out come the freaks. All of them blaming their pet peeve about the world today. Everything from an increasing number of younger people living at home to low testosterone cause by oestrogen in the water supply. What's wrong with these explanations? The cause is too broad and the effect is too narrow. If Harry is affected by the oestrogen in the water so that he doesn't want to get laid, how come Chad still lost his virginity? As for living at home? Again, the numbers are too large and the effect is too small. It’s just silly.

Sadly, the same old stuff is trotted out by the authors of the original papers, and they are supposed to be smart academics who are on the ball with this stuff. They try to crack the nut of a small (absolute) change of a fringe behaviour (being a virgin at 24) with the hammers of nationwide trends.
What the survey says is that of the children born in 65-69, 95% of women had lost their virginity by 24, compared to 92% of men. Of those born in 90-94, 84% of women and 86% of men were no longer virgins. Most Americans have had sex at least once by the time they are 24, though it seems the late 1970’s and early 1980’s were prime sexy time.

The commentary in the analysis is opaque, and that’s being polite. They did an APC analysis, which stands for Age-Period-Cohort, and to keep a long story short, that should not fill you with warm fuzzy feelings of security. This gives us the two graphs below.

(The axis labelling on these graphs is sloppy. It says “Percentage” and then gives us numbers looking like 0.02. Is that 2% or 0.02%? If you think that’s picky, try taking a graph mis-labelled like that into a meeting with a sharp business manager. You may never be invited back. I’m going to assume they mean 2% when they put 0.02. Otherwise the effects are trivial.)

What these graphs show is never explained, and neither is the idea of a "moderator of the cohort effects” given in this splendid paragraph.
The increase in adult sexual inactivity between the 1960s and 1990s cohorts was larger and significant among women (from 2.3 to 5.4 %) but not among men (from 1.7 to 1.9 %). It was nonexistent among Black Americans (2.6–2.6 %, compared to a significant jump from 1.6 to 3.9 % among Whites). The increase in sexual inactivity was significant only among those without a college education (jumping from 1.7 to 4.1 %) and was nonexistent among those who attended college (2.2–2.2 %). The trend was largest and significant in the East (2–4.5 %), followed by the West (1.7–2.7 %) and Midwest (2.1–3.2 %, not significant), and nonexistent in the South (2.4–2.4 %). The increase was slightly larger and significant among those who attend religious services (2.3–4.3 %) than among those who do not (1.5–3 %, not significant). Many of the differences between groups in recent cohorts were also significant: For example, women were more likely to be sexually inactive compared to men, Whites more than Blacks, those who did not attend college more than those who did, and in the East more than in the West.
No. It’s not you. I do this stuff for a living and I have no idea what these numbers mean. I’m guessing that the percentages are added to some base number to get the virginity rate. For the 65-69-born women, that’s 3% (period) + 2% (cohort) + 1.7% (gender) = 6.7%, which is an overstatement, and for the 90-94-born women, that’s 3% (period) + 4% (cohort) + 5.4% (gender) = 12.4% which is an understatement, so maybe we have to add on other things. Or maybe it's multiplicative. I don’t know, and the authors don't explain how we should use all those numbers. As a result, the paper is useless to everyone. (The more I run across this kind of opacity, the more I appreciate the discipline of having to tell a story in business presentations.)

Let’s do some math. The sample size for the 90-94-born is 1,910 (291/0.152). The rate increase of 9% between the 60’s and 90’s cohorts makes 114 people, most of whom, according to this analysis, are white non-college women. The sample has 955 women (half of people are female) of whom 525 are white (55% of women in the USA are white) and 65% (in the USA), or 340, of whom are non-college-educated. If this was the 60’s cohort, that would be 20 virgins. Now there are 20+114 = 134 virgins and the rate amongst white non-college women has gone up nearly seven times to 33% in that segment, compared with 6% in the college-girl segment. That gives a blended average virginity rate of 27% for all white women 20-24. NATSAL-3 tells us that in the UK almost 20% of men and women were virgins at 24, and half of them went to university.

At this point I could start speculating as to what might be causing this frankly unbelievable proportion of American virgins. But I won't. I call sampling scheme problems. Or I call something wrong with the APC method. Or both.

And maybe the girls are lying. It’s just a thought. Because it never happens in other surveys.

