if i thought you were listening, i'd never say a word: Maths

Showing posts with label Maths. Show all posts

Tuesday, 27 May 2025

The Opposite Of A Set

Want to know why I decided to spare you Category Theory? Watch this:

Intuitively it makes sense to me: sets have no structure and all possible automorphisms, complete atomic Boolean algebras have all the structure we could ask for and also all possible automorphisms.

The Sheafification (that's a real thing: it converts a pre-sheaf into a sheaf) of G is worth the subscription.

Friday, 23 May 2025

What Happened To Those Category Theory Posts I Promised?

I have been threatening to do a series on category theory, and you may imagine long hours of researching and emerging insights, followed by even more hours of making notes, drafting and re-drafting. Not to mention finding out that Blogger supports neither the asmcd nor tickz-cd LATEX environments, which are what I use to create diagrams. And talking about categories without having diagrams is like talking about art without a pictures to look at. So I'm going to write it up as a LATEX essay, and post it in the same place as my essay on the philosophy of mathematics. It will give me a longer-term project to work on. The methodological issues are wider than I thought, and quite interesting - for those sad souls into that kind of thing.

Tuesday, 25 March 2025

From Pendulums to p-Adic Numbers - A Philosophy of Mathematics

For quite some time I have been working on an essay on the philosophy of mathematics. It's gone through many changes since I first started jotting notes on the commute and in various cafes around London, and bears no resemblance to anything I thought I would write when I started. It isn't complete and probably never will be, since there are always more insights and examples to add. It does have most of the philosophical points I want to make, and talks about most of the maths I feel even half-way competent to discuss. So I'm going to make it available for whoever needs some light entertainment. It will get updated from time to time.

The link is here (link)

It's an attempt to answer these questions:

How is it mathematical techniques and tools are so suited to describe physical processes?

How do mathematical concepts work? 

What kinds of knowledge does mathematics provide? 

How do we know that a theory does not harbour fatal inconsistencies?

How do mathematicians get and develop their ideas? 

How do we judge the value of a technique, theorem or subject?

What constitutes progress in maths?

It proceeds through discussions of these issues in the context of differential equations, functional analysis, infinity, functions, numerical analysis and recursive functions, and the various types of numbers, from the counting numbers to the p-adics. There's a discussion of axiomatics and model theory and a brief look at category theory; the way mathematical ideas are structured and what mathematical knowledge is (epistemology); how we might appraise different mathematical theories (methodology); and what constitutes progress and then a discussion of how to get ideas and solve problems (heuristics).

What there isn't is detailed presentations and rebuttals of existing philosophies of mathematics, what I've called the “where Smith mistakes Jones’ summary of Brown’s critique of Frege’s

misunderstanding of Kant” school of scholarly discussion.

Tuesday, 30 April 2024

Angela Collier

That would be Doctor Angela to you, though she’s working in the private sector. She has a mug with a proton on one side and a hydrogen ion on the other, because that’s a joke about physicists and chemists (it’s the same thing). She’s darn good at explaining science stuff and isn’t afraid to throw math at you, because of course you math, right? This is a good explanation of temperature, but be warned, the explanation involves entropy.

Tuesday, 30 May 2023

How To Translate Faraday's Law of Induction into Math

I know what you're thinking. What does he get up to that stops him posting promptly and prolifically? I wish it had something to do with Instagram models and / or staying up late making music via Garageband, but it is much more mundane than that. Here's a short passage about the translation of Faraday's Law of Induction into mathematical notation that I've been working on for far longer than you might think. If I've done my job well, it should seem obvious. (Some of the original \LaTeX has been butchered to accommodate Blogger.

(starts)

Faraday's Law, more or less as stated by Faraday, is: the electromotive force around a closed path is equal to the negative of the time rate of change of the magnetic flux enclosed by the path. How does this get translated into mathematical notation? We need to know that the `electromotive force' is, in the case of magnetic induction, the work done on an elementary electric charge (such as an electron) travelling once around the loop. Work done moving along a path is always a line integral of the product of a force and a displacement (since `work = force times distance').

As a first step, we re-name those things as variables or constants:

let $\mathcal{E}$ be the electromotive force

let $B$ be the magnetic flux

let $\partial A$ be the path, enclosing an surface A

let $ds$ be a small displacement along $\partial A$

let $E$ be an electric flux field

We can write down the equations quite easily if we are familiar with the vector calculus. Work done is given by the mantra `work = force times distance'. For a small displacement $ds = (dx, dy, dz)$ and a force $E = (E_x, E_y, E_z)$ the product is $E_x dx + E_y dy + E_z dz$ which is $E \cdot ds$ in vector notation. The work done along a line is the sum of such displacements along it, which is conventionally shown by the integral $\oint_{\partial A} E \cdot ds$, giving us $\mathcal{E} = \oint_{\partial A} E \cdot ds$.

Translating the other side of Faraday's Law, Faraday thought of electromagnetic fields as `lines of force' - the more lines, the more force - and the flux of a field through an area was the number of lines of force through it. This was Faraday's way of thinking about line and surface integrals without having to actually use either.

The number of lines of force within a path is the integral of the (strength of the) vector field over any smooth surface enclosed by that path. (The `any' has to be proved, but it is becomes intuitively obvious after visualising a few examples.) So if we take a surface $A$, divide it into non-overlapping patches $dA(n)$, calculate $\frac{\partial B}{\partial t}(n)$ for the centre of the $n$-th patch, and add the total, we get an estimate of the electromagnetic field strength. Make the patches smaller, and we get a better estimate, which in the limit is the integral

$\mathcal{E} = -\iint_{A} \frac{\partial B}{\partial t} \cdot dA$

That can also be turned into a conventional double integral by substituting coordinates. Hence Faraday's Law of Induction is translated into mathematical notation as

$ \oint_{\partial A} E \cdot ds = -\iint_{A} \frac{\partial B}{\partial t} \cdot dA$

The left-hand side is the work done, and the right hand side is the negative of the time rate of change of the magnetic flux enclosed by the path. This completes the translation of Faraday's Law into mathematical notation.

This is no more conceptually complicated than if we had translated, say, a passage of Freud from German to English. There is no word-for-word mapping of the two languages, and there are many concepts for which there is a German word, but not an English one, and one must attempt to explain the German concept in English. Using an integral to denote the result of a limit of finite sums is no more exceptional that using a derivative to denote the result of take rates-of-change over ever small intervals.

We can use some maths to go further. By Green's theorem, assuming the fields are sufficiently smooth, we have

$\oint_{\partial A} E \cdot ds = \iint_A \nabla \times E \cdot dA$

So we can put

$\iint_A \nabla \times E \cdot dA = -\iint_A \frac{\partial B}{\partial t} \cdot dA$

which gives us immediately one of Maxwell's equations

$\nabla \times E = -\frac{\partial B}{\partial t}$

We can prove that, with the rest of Maxwell's equations, this another statement of Faraday's Law of Induction.

This is no more conceptually complicated than if, having translated the passage of Freud, we then drew a conclusion from the translation and some background knowledge that was not in the original, but helps us understand what Freud was saying. It just looks impressive / mysterious / difficult because it uses undergraduate maths.

(ends)

My thesis is that translating from a natural language into math notation is the same as translating from one natural language to another. It's just that maths is the language in which it is easier to see the patterns and make the deductions.

Friday, 17 June 2022

Camera Maths, or Why The Crop Ratio Works

I still think there's something odd about APS-C cameras. 'Odd' means 'It doesn't quite do what my old full-frame film camera did with the same settings'.

Time for some maths.

We will work all these examples with a full-frame (35.8 x 23.8 mm) sensor with a 50 mm lens at f8, and an APS-C (23.6 mm x 15.6 mm) with a 35mm lens at f8. Shutter speed is on Auto.

Let's look at how much light is getting in.

The f-number is the ratio of focal length to diameter of the shutter pupil. So the diameter of the full-frame shutter pupil diameter is 50/8 = 6.25mm giving an area of 3.142*3.125^2 / 2 = 15.34 sq mm. Which is a proxy for how much light is getting in. The APS-C will have a shutter pupil diameter of 35/8 = 4.375 mm, giving an area of 3.142*(2.19)^2 / 2 = 7.53 sq mm. The APS-C has 7.53 sq mm of light falling on 365.8 sq mm of sensor, or 2.06%. For the full-frame, it's 15.34 / 853 = 1.8%.

Slightly more light per sq mm falls on the APS-C sensor. If we want the same, we will need 7.53*1.8/2.06 = 6.6's worth of light, which means a pupil radius of sqrt(6.6/3.142) = 1.45 mm, a diameter of 2.9 mm and an f-number of 35/2.9 = 12. Or I could increase the shutter speed by 14%. Shutter speeds don't do that. ISO's don't either.

Let's look at depth of focus.

According to Wikipedia, depth of focus is roughly

2u^2 Nc / f^2

for a given circle of confusion (c), focal length (f), f-number (N), and distance to subject (u). The circle of confusion is conventionally 0.05 mm. For the full-frame lens and a subject 5 metres away, the depth of field is 2*(5000)^2 * 8 * 0.05 / 50^2 = 8000 mm, which is from 4m in front of the subject to 4m behind them.

The only thing that changes in this calculation when we switch to the APS-C sensor is the focal length, from 50 to 35. It changes to 16.3 m (!), 8 metres in from to 8 metres behind. That's a whole lot of extra depth of focus.

Why does the APS-C have a shorter focal length and why that one? Everybody does this, but we should understand why.

The 35 mm sensor has an area of 35.8 * 23.8 = 852 sq mm. The Fuji APS-C has an area of 23.6 * 15.6 = 365.8 sq mm. So the APS-C sensor is 43% of the full-frame area. Or the 35 mm is 2.33 times larger than the APS-C.

The APS-C sensor is showing you a smaller part of the full-frame image for a given focal length. To get the same image with an APS-C lens, we have to have a shorter focal length (shorter focal length = wider and higher picture). How much shorter?

The answer involves some school geometry. (Graphics are not my strong point.)

The focal point of the lens is behind the sensor. (I know, I learned it at school, and I'm still nodding along. Optics is magic, not physics.) The distance behind the focal point and the sensor is the focal length. (Which has nothing to do with the length of the lens. It's the length of the path the light takes between the front lens and the sensor: lenses with huge focal lengths are obtained by using mirrors. Lots of mirrors.) But I digress.

The key bit of camera geometry is the angle in red, called the field of vision. (Wrong notation in the picture.) School geometry says the it is the angle $\theta$ such that

$\tan(\theta) = \frac{\text{sensor width}}{2* \text{focal length}}$

The magic maths is this: for a given distance $D$ away, the width of the picture that the sensor will capture is $2 D \tan(\theta)$. This is why you move forwards to get nearby things you don't want in the frame, or backwards to include more of them.

Since we want the same field of vision with the APS-C as the full frame, the angle stays the same, and we can set

$\frac{\text{sensor width}}{2*\text{focal length}}$ (full-frame) = $\frac{\text{sensor width}}{2*\text{focal length}}$ (APS-C)

or 35.3/100 = 23.6/2*(APS-C focal length),

giving APS-C length = (23.6*100)/(2*35.3) = 33 mm.

Which is 2mm shorter than the industry-standard equivalent of 35mm.

(Now you know why nobody explains why you use the crop ratio.)

So using the same f-stop and the industry-rule of thumb equivalent lens sizes, the APS-C gives us - for the same shutter speed - a slightly brighter, ever so slightly narrower picture with way more depth of focus, than a full-frame. The difference in brightness is not enough at sensible ISOs to affect the shutter speed, even if you have shutter speed or ISO on auto, so it will be slightly brighter.

The ISO / shutter speed is way too coarse to adjust for the small change in brightness. But that depth of field can be adjusted, by halving the f-stop you would use on a full-frame, and letting the camera make the shutter speed adjustment.

Tuesday, 30 November 2021

Lightbulbs and the Poynting Vector (Veristasium)

Electro-magnetism (E&M) is genuinely weird. Most people never find this out, because most people never go into electrical engineering or a physics PhD (where you really have to grapple with it).

Most people think of electricity as volts, amps and watts. Maybe ohms. We don't use ohms in a household context.

The initiated talk about capacitance, reactance, inductance, resistance and conductance. They talk about "transmission lines", "skin effects", and "antennas". The rest of us need to be electrical engineers before all that makes sense. (Oh. Wait. I almost was one.)

Here's a way in: metals like copper are often called good "conductors of electricity", as if electricity is something that passes through the metal. Instead, think of metals as good receivers of electromagnetic radiation. Wires do not in some sense channel or concentrate the electromagnetic fields, or act as pipes for electrons to flow along, they respond to the electromagnetic fields. Indeed, everything responds to electromagnetic fields. Mostly not much.

Wires respond by creating their own little electromagnetic field around them. Most materials (because "everything is a capacitor") respond by retaining tiny, tiny amounts of charge which they then eventually let go of. Air does this. So does polyester, which is why it crackles when you take it off.

Mr Veritasium set up a circuit with a battery, a switch, a light opposite the switch and some very long wires connecting everything. The idea was that the wires would be so long it would take a noticeable amount of time for the electricity to "flow" along the wires and power the light.

Except the light comes on instantly.

His explanation uses a thing called the Poynting vector. Do not use those words near physicists, as they may call your bluff.

Electromagnetic waves have an electric field (E) and a magnetic field (B) that are always in phase and at right angles to each other, and to the direction of travel of the wave. (This is why there have to be at least three physical dimensions, or we couldn't have Radio Three.) Since electromagnetic waves carry energy, it makes sense that there should be an energy vector corresponding to the wave in the direction of travel. Poynting proved that this vector (*) S = E x B, where 'x' is the vector cross-product, and I've left out a constant of proportionality. It's the E times B bit that is the achievement, not the direction (cross product), because we got that already from the physics.

So Mr Veritasium said, the electric field is pointing this way (points along wire) and the magnetic field is pointing that way (points upwards) so the S vector must be (pointing at the light bulb). Presto! The energy gets to the light more or less instantly.

Which convinced absolutely nobody, because they piled in to discuss this using anything but Poynting vectors. Transmission lines and displacement currents was a favourite, because, well, engineers. Nobody was doubting the Poynting explanation, because physics > engineering, but because they were engineers, they wanted to explain it in terms of something more familiar and material.

Complete the well-known phrase or saying: cart, horse.

Poynting's insight was that the materials in which the wave moves (e.g. air, wires) do not facilitate the power transmission, rather they modify the electro-magnetic waves, and hence the power transmission. The transmission-line / displacement current explanations are consequences of the transmission of power in the direction of the Poynting vector, not explanations. When modelling a specific setup, the B vector (which is for free space) is replaced by the H vector, which takes into account the effect on the B vector of the materials involved.

What happens is this: when the switch is closed, a voltage pulse starts to travel round the circuit. This creates a magnetic field B through the Maxwell equation (with J = 0) for B

$\nabla \times B = \mu_0 \epsilon_0 \frac{\partial E}{\partial t}$

which creates a Poynting vector S. Behind that pulse comes the first lap of a current J that will be circulating once the circuit is in a steady state. That sustains the B field by the Maxwell equation (with $ \frac{\partial E}{\partial t} = 0$) for B

$\nabla \times B = \mu_0 J$

which sustains the Poynting vector S. (The E field is sustained by the battery voltage). That S field carries the energy that excites the molecules in the wire in the bulb and creates the light. Because the wire in the bulb is a good receiver of electromagnetic radiation.

Not a transmission line in sight.

It's worth noting that if the bulb was put, say 300,000,000 metres away from the switch on the opposite side of a loop, then it would take 1 second for the bulb to light, but that would still be faster than the roughly 1.6 seconds it would take for the voltage / current wave-front to reach it.

This is, of course, handwaving. More precise calculations would take account of the dielectric air between the wires to calculate H and also factor in the displacement currents, but the principle remains the same. That would start to sound like engineering. But the engineering is there to help perform the calculation, not to help understand what's happening.

(*) Strictly E, B and S are not vectors, which are 1-forms, but flux densities, which are 2-forms. This is the only time in your life you will ever read that.

Friday, 26 November 2021

Philosophy of Mathematics - Number Theory

Off in another part of my thoughts, which have been on hold for a while, I have been trying to work out some ideas on the philosophy of mathematics.

I have two theses. One is about the relationship of abstract mathematical ideas to various types of measurement or geometric properties. If you want to know how the various derivatives on curved spaces arise from the simple issues of co-ordinate changes, it's all there. The other is a methodological thesis, that the purpose of mathematics is to provide tools and techniques to solve problems that arise from modelling physical and other processes, and to understand the scope and limits of those techniques. Creating and solving the equations of the mathematical models is what's usually called "applied mathematics", while understanding the scope and limits of the techniques is a lot of what's called "pure mathematics".

And then there's Number Theory. Which is about numbers. Not mathematical models.

You know that Langlands thing that all the Kool Kids are working on?

Yep. Number theory. Finite field number theory at that. Geometric Langlands is even more abstruse.

It takes genius-level insight and technique to understand the more recent developments in Langlands. That's the point: if the specialists can barely follow it, how is it going to be any use to some poor post-grad working on differential geometry at the University of Ennui-sur-Blase?

The social purpose of mathematicians is to teach other people - physicists, statisticians, epidemiologists, computer scientists and programmers for example - how to use the problem-solving techniques mathematics offers. What mathematicians do in their spare time is their business: they need a decent laptop, a whiteboard and some paper and pens: math is cheap compared to fundamental physics.

The Langlands guys can do what they want in their spare time. But it's a rabbit-hole. Maybe it's a big, well-lit rabbit-hole with all the health and safety gear and plenty of mechanical digging tools, but it's still a rabbit-hole. Unlike some of the rabbit-holes mathematicians have buried themselves into (functional analysis, for instance), Langlands is not going to produce anything useful to regular working stiffs (for instance, functional analysis produced the theory of weak solutions to differential equations, which is very useful). I feel confident saying that because Langlands is about structures the rest of mathematics just doesn't use.

(Rabbit-holes are as opposed to specialisms, which are very specific subjects that have useful applications in the real world or other parts of maths with real world applications. Like research in PDEs.)

Maybe "rabbit-hole" should be a term of art in methodology. It's a line of research that has no obvious application to any existing problems or in other branches of maths. The scientific version would be a research programme that was making theoretical progress but no empirical progress (was not making new predictions). A rabbit-hole may branch up to the surface every now and then, as applications to problems in other branches of maths are found, but generally once dug, the researchers dig away happily underground.

In this case I would be saying that Number Theory was a mathematician's pastime, and that other very abstruse, or very off-beat, programmes, are for all the sophistication, esoterica for the aficionados. Which doesn't sound too dramatic.

Monday, 29 January 2018

On Probability Theory and Theories of Probability

(This expands on some ideas I touched on in the post about the single-event probability fallacy. If you have a sense of deja vu, that’s why. It’s a different angle in the same ideas.)

Probability theory is abstract mathematics. It has the same axioms as measure theory (plus one that says the measure of the whole space is 1, but that’s really just a convention), and it focuses on different things. As a theory, it has applications.

One is to the frequencies of outcomes of repeated events, such as rolling a dice, making a component by machine tool, or the path of a small particle surrounded by fast-moving smaller particles. With a suitably set-theoretic understanding of what ‘events’ and ‘outcomes’ are, probability theory can be shown to apply to such frequencies.

Another application is to betting odds, though here probability theory does not apply as a description but rather as a prescription. If the betting odds are to be ‘fair’, that is, if the odds don’t favour the bookmaker or the customer, those odds must follow the laws of probability.

The same applies to the idea of ‘degree of belief’, whatever that means and however we measure it. If those degrees of belief are to be consistent, they must follow the laws of probability. Betting-odds and degrees of belief are sometimes called subjectivist probability.

In earlier and less enlightened times, there were heated arguments over which was the ‘real theory of probability’, and both sides missed the point that they were discussing different applications of the same abstract theory, and as a result were having an argument about whether over-easy or well-done was the correct way of cooking eggs.

In addition, there was something called ‘The Principal Principle’ stating that the rational degree of belief in the outcome of a repeated event is its frequency. The result is that, if we are talking about repeated outcomes, probability means frequencies.

This leaves the question about what we might mean by the probability of single events and how it might be measured. The ingenuity of some answers rival the madder interpretations of Quantum Mechanics. Some of them turn out to be frequencies in disguise, as is the Possible Worlds interpretation. (I’m not going to describe that: it’s like the Multiverse and just as non-empirical.) It’s not that those interpretations don’t work: it’s that only about forty people at any given time can understand them, and none of them work as statisticians. So whatever the working statisticians might mean, it’s not what the ingenious people suggest.

Personally, I think that phrases like ‘I don’t think that’s very likely’ or ‘I wouldn’t be surprised’ or ‘That’s probably what happened’ are figures of speech, referring, if to anything, to something that does not have to obey the probability calculus. There is no obligation on the figurative speech of ordinary people to obey rules made up by mathematicians. People do believe things, and that belief may be a bodily sensation, as the disappointment of a belief often is. Maybe those figures of speech are about the strength of those belief-sensations. We can, of course, say that if those belief-sensations are to be rational, they need to obey the probability calculus, but what we can’t say is that if they don’t, then ordinary people should not use probability-words to express their beliefs. Ordinary language got there first.

Similar issues affect the idea of the expected value. The expected value, or expectation, or prevision in French, or average in GCSE arithmetic, is a mathematical construction. It’s the sum of the probability-weighted outcome values. The formal expected value of a single roll of a fair dice is 1/6+2/6+3/6+4/6+5/6+6/6, which is 3.5 and that’s never going to appear on any roll of a six-sided dice. (A fair dice has no modal value - or perhaps it has six - and its median is any value between 3.00000000...1 and 3.999999999999... ) as well: half the throws will be below a number between 3 and 4 and the other half will be above it.

In a game with payoffs of £0 and £100, with equal odds, the expected value is £50, but that will never be the result of an individual trial: the payoffs are £0 or £100. It is what we would expect to be the long run average value of the payoff per trial. However, an actual sequence of trials that ever reached and stabilised at £50 after a ‘reasonable number’ of trials would be quite rare: what we should really expect is that the actual average payoff per trial should appear to converge to £50 as the number of trials increased. Measuring an expected value in practice is much more complicated than calculating it.

We can always make a formal calculation and, rightly, call that the expected value. But we must ask how that value is to be measured, and if it can’t be, or only has a meaning in some series of counter-factual logical universes, then it remains a formal calculation with no practical application. We can calculate the expected value of a one-off event, but we can’t measure it. Measuring expected values is a process that refers implicitly to a run of outcomes. The formal calculation for a single event is correct, but formal correctness is no guarantee of empirical application.

Since the formal expected value of our game has no empirical meaning for one event, it can’t be a guide to any decision we make. This has, as I’ve discussed before, some consequences for so-called rational economics.

Thursday, 25 January 2018

The Tit-For-Tat Conjecture

Suppose you and I are going to play a co-operation game. There are many strategies for these, but the most beneficial is tit-for-tat: start by co-operating, then repeat the previous move of the other player. It’s simple, but it doesn’t dominate all the others. But it does give the maximum reward. If I know you’re going to use it, I may as well climb on board for those maximum rewards as well. Tit-For-Tat is a fairly rare strategy: it works even when the other person knows you’re using it. In fact, it works especially when the other person knows you’re using it.

One that fails if it’s public knowledge is the Secretary Strategy. In this, an employer is hiring, and has a limited time to pick a new person. It turns out that the most effective strategy is for them to look at the first third of the candidates, and then hire the first candidate better than all the ones they have already seen. This will get them the best candidate in 37% of hires. Some, of course, will never hire anyone, because the best was in the first third. It’s not a reliable strategy.

In this economy, recruitment is done through agencies, and they get to know the habits of the recruiter. If the agency know the employer uses the Secretary Strategy, they will arrange for the employer to see lesser-quality applicants at first, so they can place a reasonable one rapidly. The Secretary Strategy fails because the employment agent invalidates one of the assumptions, which is that candidates arrive at random. But then that’s the point of strategies and gaming. The only way out for an employer is to recruit directly, like they used to. Even then, in a small world, which some industries are, an interviewee finding she is the first might politely decline, on the grounds that ‘everyone knows you never hire the first person you see’. This denies the employer the opportunity to calibrate that the Strategy provides. The only way out of that is to lie to the candidates about their place in the queue: that’s not such a smart idea.

Most strategies are like this: they work as long as the other side don’t know. What makes Tit-For-Tat different? The Secretary Strategy predicts the future behaviour of its user, which allows others to game it. Tit-For-Tat can also be predicted, but the prediction is based on the other person’s behaviour, not its user’s intentions.

Strategies are, amongst other things, formalised intentions. If we know the strategy, we have a good idea about the objectives it is intended to achieve, and if we know that, we can make more informed guesses about the other ploys the other side might use.

Here’s a Conjecture: any strategy that works even when the other side knows you’re using it is equivalent to Tit-for-Tat.

If this is true, the immediate consequence is: unless you’re using Tit-for-Tat, your strategy can be gamed to your disadvantage.

Monday, 9 October 2017

Monty Hall - Stick or Switch? It Depends How Often You Can Play

The Monty Hall problem is back in the news, or at least the weekend edition of the Financial Times, again, I think because Monty Hall died recently. Here’s the problem:

You’re on a quiz show with a host, Monty. There are three cabinets A, B and C. In one cabinet is a car, and in the other two a goat. You get to nominate a door, and then Monty will open one of the other doors and ask you if you want to change your choice. What you know is that Monty never opens the door with the car in it. Never. Should you change your choice?

The answer, given by Marylin vos Savant, is that you should, as in two-thirds of the cases, you will win the car. When she gave that answer, the wrath of a zillion statisticians and mathematicians descended on her. Here’s her argument: there are three options (in order A, B, C)

Car Goat Goat
Goat Car Goat
Goat Goat Car

If you pick A, you lose by switching in option 1 and win in 2 and 3. Otherwise you win by switching in the other two options. Take the odds and switch. At least when you have the opportunity to play the game over and over.

What happens when you can only play once? Choose A and suppose that Monty opens door C to show a goat. Now you know there are only two options:

Car Goat Goat
Goat Car Goat

In this case, the odds are 50-50 for switching. Why? Because you don’t have third option of Goat-Goat-Car which would force Monty to open door B.

Play the game over and over, and switching will win more often. Play once, and it’s a flip of the coin, so you may as well switch, since the odds are the same. There’s a winning strategy for multiple plays, but not for a single play.

Damn that’s clever.

Statistics is not only hard, it also only applies when you can repeat the experiment.

What about all the other arguments, including one quoted on Wikipedia that says this;

By opening his door, Monty is saying to the contestant 'There are two doors you did not choose, and the probability that the prize is behind one of them is 2/3. I'll help you by using my knowledge of where the prize is to open one of those two doors to show you that it does not hide the prize. You can now take advantage of this additional information. Your choice of door A has a chance of 1 in 3 of being the winner. I have not changed that. But by eliminating door C, I have shown you that the probability that door B hides the prize is 2 in 3.’

Here’s the mistake: "the probability that the prize is behind one of them is ⅔” should read “the probability that the prize is behind one or other of them is ⅔”. No argument that tries to establish that switching always gives a 2:1 advantage can be right, because when you can only go once the odds are 50-50.

On a one-shot play, sticking is as good as switching.

And in the TV show, you only got one shot.

Monday, 28 November 2016

Newcomb's Problem

This appeared in the Guardian recently.

The problem: two closed boxes, A and B, are on a table in front of you. A contains £1,000. B contains either nothing or £1 million. You don’t know which. You have two options: Take both boxes, Take box B only. You keep the contents of the box/boxes you take, and your aim is to get the most money.

But here’s the thing. The test was set by a Super-Intelligent Being, who has already made a prediction about what you will do. If Her prediction was that you would take both boxes, She left B empty. If Her prediction was that you would take B only, She put a ₤1 million cheque in it.

Before making your decision, you do your due diligence, and discover that the Super-Intelligent Being has never made a bad prediction. She predicted Leicester would win the Premier League, the victories of Brexit and Trump, and that Ed Balls would be eliminated from Strictly Come Dancing. She has correctly predicted things you and others have done, including in situations just like this one, never once getting it wrong. It’s a remarkable track-record. So, what do you choose? Both boxes or just box B?

This is supposed to puzzle people. And puzzles that don’t seem to have a decent answer usually arise because they aren’t a decent question. Anyway, it originated with a physicist - a descendent of the brother of the famous Newcomb - and was popularised by Robert Nozick, and then Martin Gardener at the Scientific American. See where I’m going with this?

Suppose I say to a bookie: if I think Fancy Girl will win the 2:30, I will bet £100, and if I think Blue Boy will win, I will bet £50. His reply would be: all right which is it? I can’t place a bet that’s conditional on what I think will happen: the whole point of a bet is to pick one of the outcomes. The closest I can get to making a conditional bet is to put money on each outcome, and if the bookies are doing their job well, I will lose doing that.

What you want to do is this:

If I chose Box B alone, she will have predicted that and put the cheque in it. But if I chose both boxes, she will have predicted that and not put the cheque in. So I should choose Box B.

This assumes what the Special Theory of Relativity tells us cannot happen, that a future event can cause a past one. So let’s try this:

If she predicted that I would chose Box B alone, then she put the cheque there, and I should choose it. If she predicted I would choose both boxes, then she wouldn’t have put the cheque in Box B, so I should choose both boxes, because at least I’ll get £1,000.

The catch is that doesn’t tell you what to do, since you don’t know what she predicted and so can’t detach the consequents from the conditionals. The next one is silly...

If she predicted that I would chose Box B, then she put the cheque there and I should choose it. If she predicted I would choose both boxes, then she wouldn’t have put the cheque in Box B, so I should not choose both boxes, only Box B

That sounds good, but since there’s no cheque in Box B, you get nothing. But what you were going to do was this:

Suppose I choose Box B. Since her predictions are perfect, she predicted that and the cheque is there. But if I choose both boxes, again since her predictions are perfect, the acheque isn’t there. So I choose Box B.

This doesn’t require backwards-causality, but it does require someone to ensure the predictions are perfect. Russian hackers, presumably.(*) What we’re told is that she’s good, not that the game is rigged.(**) Now try this:

If she predicts Box B and I choose Both, I get the cheque. If she predicts Both and I choose B, I get nothing. If she predicts Both and I choose Both, I get £1,000. If she predicts B and I choose B, I get the cheque. So if she predicts B, I get the cheque no matter what I do, and if she predicts Both I lose if I choose B. So I take Both Boxes.

Those are the actual options assuming free will and imperfect predictions. The only way you get confused is to assume a) that her predictions are causal, or b) that your actions are temporally-backwards causal, or c) that someone is rigging the co-incidence between her predictions and your actions.

So how seriously you take her past performance on predictions? This starts to make it sound like we might want to use Bayesian Inference, and indeed the Wikipedia entry for this problem lists David Wolpert and Gregory Benford as having a Bayesian analysis that shows that the different arguments arise from different models of the assumptions, so that there isn’t a real paradox, just an old-fashioned ambiguity.

The real reason you choose both boxes In the Guardian’s example is this: it’s the only way you get anything. She’s a woman: the point was to get you to choose Box B, and now you have, by Briffault’s Second Corollary, she doesn’t have to give you the money, so she cancelled the cheque (***).

(*) Topical political joke.
(**) Another topical political joke.
(***) Robert Briffault

Thursday, 17 November 2016

A Mathematical Joke

Why doesn't the Hamiltonian (operator) live in the suburbs?

Because it doesn't like to commute!

(Boom-tish!)

We're here all week folks!

(This was told me by a colleague at work, who says he made it up at university.)

Monday, 6 June 2016

Never Mind the Proof, Why Is The Riemann-Roch Theorem True?

A very long time ago, I began a project to understand the modern theory of algebraic geometry, and specifically the proof of the Riemann-Roch theorem for projective curves. It's finally over. The completed paper, Never Mind the Proof, Why Is The Riemann-Roch Theorem True? is available here.

Why should you read it? Because it will actually explain why the theorem is a) difficult in the first place, and b) true. You won’t drown in endless algebra, rather swim in a sea of geometric intuitions. You will see the Zariski topology being used to provide geometric insight and understand why flatness is at once difficult and yet easy. You will thoroughly understand the difference between a vector, a co-vector and a one-form (a lot of people who write textbooks don’t) and so why a global holomorphic one-form doesn’t give rise to a global holomorphic function. There’s a simple geometric way of thinking about spectrums and sheaves, and an explanation of where twisting sheaves really come from. It’s all about the informal illustrations, arguments and analogies.

It's inspired by Imre Lakatos' championing of informal mathematics, especially in his essay Proofs and Refutations. The aim is to show that informal argument and exposition can lead to greater understanding of abstract ideas and complicated proofs. I used the Riemann-Roch theorem for Riemann surfaces and projective curves because it provides an example of the informal approach in action on a deep but accessible theorem, rather than a toy example.

It has my real name on it and I did think about that. W S Gosset, aka “Student”, was the only serious mathematician to rock a pseudonym (Nicholas Bourbaki might be someone’s real name, oh wait…) and it’s just pretentious for me. If I was someone whose name appears in the Financial Times, maybe a pseudonym might be appropriate, but I’m just another hack in an open-plan office. Nobody I work with or know is ever going to Google “Riemann-Roch” and find this page by chance. Everyone else will be a stranger - but welcome - now and in the future, so my real name is much the same as a pseudonym.

Thursday, 7 April 2016

Piper Harron’s Identity Politics

Piper Harron is a black woman who feels oppressed by mathematics. She has a PhD from Princeton and is married to a mathematics professor at the University of Hawaii. Her PhD is written in an informal style that crosses the border to cute a few times. She's been interviewed by no less that MathBabe Cathy O'Neill and Michael Harris has talked about her at least twice on his blog.

Here’s an extract from a post called Why I Do Not Talk About Math

“My experience discussing math with mathematicians is that I get dragged into a perspective that includes a hierarchy of knowledge that says some information is trivial, some ideas are “stupid”; that declares what is basic knowledge, and presents open incredulity in the face of dissent. ”

Translation: other people have strong ideas about what’s worth spending time on that they don’t hold back and I get upset by that.

To which the reply is; woman up, behave like an adult and join the community, or quit. Because that is going to happen to her wherever she goes. Some places they may be more polite about it, and then she will finish the year with a “struggling” grade in her appraisal, which she will know is their way of telling her to be employed elsewhere.

Attention-seekers feel oppressed by lack of attention. They don't want attention for what they have done, but for who they are, or perhaps for the fact that, being who they are, they have done what they have done. Attention-seekers take to identity politics like cats to catnip: it gives them so many ways to define the "being who they are" that makes their otherwise journeyman work attention-worthy.

And if Ms Harron thinks a bunch of nerds in a math seminar are bad, she’s going to get the shock of her life when she tries to fit in with the other mothers at the school gate. Then she will know scorn and rejection.

Harron’s affiliation with identity politics is a shame. Because she’s on to something with the style of mathematical papers and communications. Fifty years after Imre Lakatos’ Proofs and Refutations, a lot of mathematicians still write like Bourbaki. That’s something worth writing about.

Monday, 22 February 2016

Homotopy Type Theory

I’ve had the Univalent Foundations Program’s book on Homotopy Type Theory on my to-read list for quite a while after reading about the project on Michael Harris’s blog. For some reason, recovering from a nasty fever was the exact right time to skim-read the bits I would know about, viz, the Introduction, and the chapters on Set Theory, Category Theory and real numbers.

Call me a rude mechanical but I’ve always thought that people who go in for type theories have missed a number of points. Yes I do know that Homotopy Type Theory is currently the subject of active research by people who are cleverer in their sleep on a bad day than I am awake on a good day, but as someone once remarked about the "highly motivated individuals" that were popular in recruitment in the 1970’s, the catch is that they can all highly motivate themselves up a gum tree.

Type theory was an ugly kludge invented by Bertrand Russell to get round the fact that, in its unrestricted form, the Axiom of Comprehension leads immediately to inconsistencies of which Russell’s Paradox is the most famous. The "set of all sets that are not members of themselves” looks like a well-formed definition, but now consider that very set. If it does belong to itself, it doesn’t, and if it doesn’t, it does. Russell’s kludge was to stratify logical formulae into “types”, and impose the rule that a set could only belong to a set of higher type than itself. It worked well enough for him to finish the project of showing that mathematics could be developed from “purely logical foundations” that was the aim of his massive Principia Mathematica.

A few years later, however, Zermelo, Fraenkel and Skolem devised the current axioms of mainstream set theory, traditionally called ZFC “Zermelo-Fraenkel with (the Axiom of) Choice” (for some reason Skolem’s name always gets left out). The foundationalist programme became “mathematics is derivable from the axioms of ZFC and mathematical logic” instead of “mathematics is derivable from the Axiom of Comprehension, type theory and mathematical logic”. A lot of people are very happy with that, including me. The point of the foundational programme was to show that such a derivation was possible, not to argue that fractions were really ordered pairs of massively nested copies of the empty set. Once we have ZF, we don’t need the kludge that is types.

And there it should have died. Along with the biplane, the TOG tank, the Sinclair C5, airships and programmable calculators. All may have been wonderful and useful once, but the world has moved on. And the same goes for mathematical theories, which are developed to solve problems. Cantor’s set theory was not an attempt to fabricate a “new language for mathematics” but an attempt to understand the limit points of Fourier Series. It so happened it let other mathematicians re-state other theories in a clearer and more systematic way, which was why it was adopted so quickly. We still use it because it’s still the best way of stating many definitions and theorems. But as a subject on its own?

In its own right, set theory is an interesting for a) large cardinal theory, or b) Cohen forcing constructions for independence proofs and proving the existence of weird objects. These are not going to make your e-mails any safer or your pictures any less fuzzy any time soon. People work on set theory as they work on model theory, both of which John Bell drummed both into me back in the day, but I’m not going to sell you on the commercial benefits of saturated structures (generalisations of the idea of algebraically-closed fields). It’s interesting to some people, but it’s a creek off the main river of mathematics. The same goes for any foundational subject.

Category theory is foundational in that sense. It was devised to formalise proofs and constructions that occurred in multiple branches of mathematics, and to formalise the “X and Y have different fidget groups, but fidget groups are preserved under twiddles, so X and Y are not twiddle-equivalent” arguments that were appearing in algebraic geometry at the time. There’s some quiet satisfaction from the moment when you realise that an SQL inner join is really a pullback in disguise, but that knowledge does not make you a designer of more stylish queries. In the same way that, just because you can show that a folklore Haskell programming trick actually illustrates Kan Extensions, it doesn’t meant that knowing anything about Kan extensions will make you a slicker programmer. Academic computer scientists love their Haskell and their category theory, but if either was a pre-requisite for a job at Google, Business Insider would have run an article called “Here’s the far-out math theory you need for a job at Google" a long time ago.

And there’s one more thing. Type theories have, in general, a non-classical logic! (Except at the “-1” level, where you can do classical logic.) Would you have guessed? I have nothing against the study of multi-valued and modal logics, though again, I’m not sure I want my taxes to pay for it. I get (or did at the time I read about it) why it appears naturally as the “natural logic” of certain categories of sheaves, but that’s no more profound than saying that the “natural logic” of non-complemented lattices is non-classical, and nobody thought to do that. For some reason failure of the law of the excluded middle is seen as some kind of abstract virtue and I can’t help hearing alarm bells when it is so presented. It’s something to do with hair-shirts, I think. Maths goes better with the occasional non-constructive proof by contradiction.

I have no problem with what consenting mathematicians choose to talk about in the privacy of their conferences, though if it was me, I wouldn’t use a lot of taxpayer’s money to fund research into Univalent Foundations. Voevodsky is selling it as a theorem-prover, and that will aways get some attention, but you and I know that it wouldn’t help much even if it did produce an effective theorem-prover. Types can only capture a certain class of errors, not something subtle everyone has so far missed about about (say) Cohen-Macauly rings over finite fields.

So do you read the book and follow the work? Look, some people still swear by the λ-calculus for dealing with functions. I know it works. But anyone who actually used a λ-function in actual production code would find their code re-factored to get rid of it at the first opportunity. Ditto types: I’m sure the maths is impeccable. It’s the project that’s a little pointless.

The whole foundations thing was done in the early twentieth-century. The point of mathematics is to solve problems, and while the majority of those problems mostly still come from physics or ~~gambling~~ probability theory, some now come from cryptography, computing, biology and other sciences. Most are, in the end, to do with solving differential or difference equations. I’m going out on a limb here, but I’m pretty sure no-one is going to improve image-enhancement techniques with higher homotopy types.

Monday, 25 May 2015

Dumbing-Down GCSE Maths - Again?

The Guardian had another "tough maths question" on Friday, to accompany an article about how the GCSE exam boards were being asked to dumb-down make the exam accessible to pupils of all abilities. We’ll pass that one over, because the fact the request was leaked means that even the exam boards think it’s ridiculous.

So here’s the question:

And here’s the answer:

Angle TAP is a right angle, because PTN is an equilateral triangle (all sides equal) and it’s half-way along, so bisects the angle at P and must do so perpendicularly. OP can be calculated from Pythagorus (sqrt(90^2 – 40^2) = 80.62. AT is 20cm long and is the hypotenuse of a right triangle ATP, so AP = sqrt(40^2 – 20^2) = 30.64. We know OP and AP and tan(OAP) = OP/AP = 2.327 and angle OAP = arctan(2.327) = 1.165 radians = 66.75 degrees.

The question is two applications of Pythagorus’ theorem, one of SOHCAHTOA and one of the properties of equilateral triangles. The question points towards the solution by a) asking you to calculate angle TAP and then calculating AP. In the context of GCSE maths, you can only calculate AP if angle TAP is a right-angle. They don’t do the general version of Pythagorus. That’s a clue right there.

What makes it difficult is that the prompts in the question only take you half-way there. To get angle OAP, you need its sin, cos or tan, and you can’t read those off from the question. Because OA is not as long as ON. (Following the hint that TON is an isosceles triangle will take you up the garden path.) We know AP so we need OA or OP. OPT is a right triangle with two known lengths (PT and OT), so we calculate OP. This gives us the Opposite and the Adjacent of angle OAP, and that’s its tangent. Now find the arctan on your calculator.

It’s the need for sustained reasoning, for spotting the false starts, and for solving the problem of the missing bit of information, that makes this a difficult question. It’s not the maths that’s hard - this is Year 8 at most - but the ability to perform sustained reasoning and problem-solving.

Most people can’t do that, anymore than most people can run five-minute miles or deadlift 200+ lbs. So there’s two things here: the first is to sort out the young people who show some aptitude for it, so they can pointed to subjects where it is needed; the second is how to design a syllabus and examination that gives the rest of the world something useful. Even if you can’t deadlift 200lbs (I can’t) you can still be taught useful exercises. Even if you can’t conduct a chain of reasoning, you can still be taught to do basic numeracy, estimation, ratios and comparisons.

My memory is that, one year after doing O-level maths, and so half-way through an OND in engineering, I and everybody else on the course looked at an O-level paper and realised it was trivial compared to what we had learned since. How had we ever thought it was hard? That was when the O-level included calculus, and most of us knew about “imaginary numbers” and had done ever since learning the formula for solving a quadratic equation. Back then the maths teachers used to say they thought that including complex arithmetic in the O-level was only a couple of years away. Well, we’re regressed a lot since then.

Thursday, 16 April 2015

Cheryl's Birthday and Other Trick Maths Questions

So this went viral this week.

I hate these things. But I buckled down (at work!) and did it in 15 minutes. It helped to draw the dates as a matrix: months on the top, dates down the side.

The press spin is that 15 year-old kids in Singapore are smarter than almost everyone in the Western world. The excellence of Singaporean secondary education is a common trope of the western press, closely followed by the superiority of Chinese, Japanese and Korean secondary education.

Of course this is nonsense. For one thing, this superannuated grey-haired Anglo did it in about fifteen minutes. (I usually do the “Can you answer these GCSE questions” quizzes and I always ace them in a very short time though not after a couple of mis-steps along the way. The day I can’t ace them is the day I will apply for a job in product development.) For another, the exam board itself stated that this question was to help identify the better students.

And for another, this isn’t a serious question. It’s a trick. It’s the kind of trick question that a certain kind of epistemologist likes to use to discuss abstruse issues, and it’s the epistemological analogue of the trolley problem.

What makes a problem a mere trick instead of an interesting problem? An interesting problem gives rise to some theory to solve it: anything from an algorithm to a 400-page mathematical paper full of abstruse theorems. A trick is solved by a non-transferrable, non-generalisable argument. Remember all those integrals you had to solve at school? You had to play guess-the-substitution that would turn them into simple ones. Finding substitutions is a a trick. Integration by parts is a method, even if it does involve some trial-and-error.

Tricks give people the wrong idea about what a subject is about. The maths A-level syllabus used to be strong on tricks, whereas real mathematics is mostly about geometric insight to suggest theorems, and algebraic slog to prove them. Not finding a transformation or algebraic manipulation that magically makes the answer appear. Ask a serious chess-player whether they do chess puzzles: most of them don’t.

But the general public likes tricks. It likes to think that maths, or chess, or anything else that requires lots of reading, understanding, and actual insight, not to mention lots of trial and error, is really about seeing-something-that-makes-it-easy. Because that makes it magic, and the general public don’t mind not being able to do magic. Magic is, after all, just tricks. But if it’s hard work, and guessing and learning from mistakes, and adapting techniques you read about in some other contexts (that means reading, right?), then we’re looking at choices of how they spent as an adolescent, and now spend as an adult, their time and energy.

And guess what? Most people made choices that means they can’t solve a problem that bright 15-year olds in Singapore can solve.