Archive for February 2010

Computational modeling is not: simulation

Computational modeling and simulation have many similar things in common. They both involve using computers, they both use encoded descriptions of how things work, they both “run” one or (usually) many times.  The easiest way to see how they differ is to note their very different goals.

Simulations are run for the purpose of making specific predictions. The rules can make sense, but this is not necessary.  A simulation is judged by whether or not the predictions are accurate. The results can be very sensitive to all kinds of things.

Computational models are run for the purpose of understanding general principles and, hopefully, generating testable hypotheses. The rules are simple compared to the “real” rules, but must make sense. A computational model is judged by the same standard as any other formal model, by how interesting and general its results are.  As such, the results should be robust to changes in all kinds of things.

A simulation can be based on a thousand equations. Suppose Equation 462 ends in “+ 7″. One might ask, “Why ‘+ 7′?”  An appropriate response for a simulation is “If we change it to some other value, it doesn’t work.”  This is not a good response for a computational model; a computational model that breaks so easily is too fragile, and therefore not general enough to be useful.

Wikipedia gets this right when defining computational modeling, but exactly wrong with two of its examples: weather forecasting and flight simulators.

Different folks use these words differently, and that’s fine. My point is that there is a difference between these two goals (prediction v. understanding) and the difference in goals leads to different choices in the practice of each.

Terrorism is not Lightning or Peanut Butter

I came across the book Panicology, where “Two Statisticians Explain What’s Worth Worrying About (and What’s Not) in the 21st Century”.  The back cover chastens the reader:

Terrorism?

More Americans have been killed by lightning or by peanut allergies than by terrorist attacks.

I’ve read this comparison in different forms many times; it is true, but misleading. The implication is that, because you don’t spend much time defending against lightning strikes, and that is reasonable, you are foolish to spend much time defending against terrorist attacks. Now let’s apply Drake/Backus equation reasoning to lightning and terrorism separately to see why these are not the same.

A model of getting hit by lightning during a given storm might include the following components

A_1 = Pr(lightning hitting me given lightning generated and I’m outside)

A_2 = Pr(lightning hitting me given lightning generated and I’m in the basement)

B_1 = Pr(lightning generated given I’m outside)

B_2 = Pr(lightning generated given I’m in the basement)

C = Pr(I’m outside rather than in the basement)

The probability of me getting hit by lightning is then A_1 \cdot B_1 \cdot C + A_2 \cdot B_2 \cdot (1-C).  Now, I doubt my presence outside or in the basement affects the voltage potential between Earth and sky so much as to influence the frequency of lightning strikes, no matter how much iron is in my diet. This means that B and C are independent, so B_1 = B_2.  Let B = B_1=B_2, and factor out B.  The probability of me getting hit by lightning is B \cdot [A_1 \cdot C + A_2 \cdot (1-C)].  Assuming I’m safer in the basement (A_1 > A_2) I can minimize the chance of getting hit by minimizing C, the probability that I am outside. However, as long as A_1 is very small, C doesn’t matter, so I might as well go outside if I like.

How does this model fare when applied to terrorism? Not as well.  The analogous model of being a terrorist victim on a given flight would include

A_1 = Pr(terrorist successfully affecting me given terrorist attempt and I vote for less airport security)

A_2 = Pr(terrorist successfully affecting me given terrorist attempt and I vote for more airport security)

B_1 = Pr(terrorist attempt given I vote for less airport security)

B_2 = Pr(terrorist attempt given I vote for more airport security)

C = Pr(I vote for less airport security)

Again, the probability of the bad event is A_1 \cdot B_1 \cdot C + A_2 \cdot B_2 \cdot (1-C). However now B_1 \neq B_2; in fact, we expect B_1 > B_2 because the terrorist has a greater chance of success given an attempt:  A_1 > A_2. If we have no airport security then A_1 becomes large, so my choice C really matters.

Terrorism is not lightning.  Terrorists respond to our choices about security; lightning does not respond to my choices about  where I spend the rainstorm. Deciding whether peanut butter responds to one’s choices is left as an exercise for the reader.

The Engineer’s Fallacy

As a mathematician-turned-social-scientist, I have first-hand experience with the traps a physical scientist can fall into when trying to explain how people act and interact. This is the first of many posts in which I will describe my favorite error, which I have come to call “The Engineer’s Fallacy”.  Rather than define it straight away, I will start with a recent example making it’s way around the mediascape.

Peter Backus gained some notoriety when he applied the reasoning of the Drake equation to explaining why he didn’t have a girlfriend. The reasoning in both cases is simple.  What is the fraction of people in the local area of the right age?  What fraction of those are female? What fraction of those are single?  What fraction of those are cute? What fraction of those would find me cute? What fraction of those are out on a given night?  Continue, multiply the fractions together, and you get the probability of meeting a future girlfriend tonight.

What is the hidden assumption that allows us to combine those individual probabilities so easily via multiplication? Independence. Backus assumes that he meets people at random and that people in all these cross-cutting groups (young/old, male/female, married/single) are distributed across town without being at all correlated. Put more simply, he assumes that people don’t, in any meaningful way, associate with similar people. This is enough to make a social network scientist cry.

That’s the theoretical objection, but an empirical objection is stronger evidence.  Suppose Backus’s reasoning is correct, that the chance that he will meet that special person is 0.00034% on a given night.  Continuing with his reasoning, if he goes out both nights of every weekend for about 27 years, that gives him 1% chance of meeting a Miss Right. Over his lifetime he might have a 2% chance of meeting someone. If he is typical, then most other people should have about a 2% chance of meeting someone worth dating at any point in their lifetime.  However, according to Backus, about half the people in the area are married.  Presumably, most of these people dated beforehand, and likely dated more than one person.

Almost as amusing as Backus’s argument is the “encouraging” rejoinder by Diego Trujillo who suggests taking a thermodynamical approach! Trujillo doubles-down on the independence assumption. Applying his reasoning, there is an equal chance of me (1) not having a girlfriend and (2) having [insert famous woman here] as a girlfriend.

Independence is a strong assumption. Social scientists typically are inculcated with more skepticism about it than physical scientists, and for good reason: people are social.