Dimensionality matters: three implications of ideology being multidimensional

Left or right? Liberal or conservative? Blue or red?  We know the terms bandied about in the punditverse, but it’s easy to forget that there is more than one way to divide the world into two political ideologies.  If you categorize political ideology into two camps, you are implicitly using a one-dimensional model, a model that puts everyone on a single line from left to right.

If you think there are two basic issues — how much the government should influence the economy and how much the government should influence social issues — then you are using a two-dimensional model, a model that locates each person on a plane.

So here’s a question:  How many dimensions are there in political ideology? This question is a lot like the question “How long is the coast of Britain?” (alt): the answer depends on how you’re going to use the answer.

If you want to explain variance in the US House and Senate, one dimension is plenty in most cases, and two dimensions is sufficient the rest of the time. If you are looking at an entire (two-year) Congress or a longer period, one dimension will do a good job most of the time

If you are trying to explain a single bill — or each of a set of single bills — the story is more complicated. On a given bill, some legislators vote for ideological reasons, some for procedural reasons, some because of interest groups or constituent pressure, some because of bribes, some because they are taking cues from other legislators, and some just make mistakes. When you are interested in distinguishing among legislators based on aggregate voting, few dimensions are needed. When you want to explain the path a bill takes, more dimensions may be required.

This has implications in three kinds of research we conduct based on the analysis of roll call votes:

  1. Theorizing about ideology:  When building theories about ideology, we cannot blindly assume that ideology is one-dimensional.  Arguably there are one-dimensional policy areas, legislators, and specific bills, and the one-dimensional case is worth considering, but it is not anywhere near universal.
  2. Ideology as an explained variable in empirical work:  How do parties, interest groups, constituent pressure, and other factors affect votes? When exploring factors that affect ideology — using statistical mo0dels where ideal points on the “left-hand side” — we should not ignore that there may be multiple dimensions to explain.
  3. Ideology as an explanatory variable: How does ideology affect interactions with the president, the courts, and the bureaucracy? Even scholars of institutions other than Congress should remember that when ideology is on the right-hand side of the statistical model we might want to include more than one dimension.

To indulge in the snowclone, dimensionality matters when considering ideology. Make a mindful choice when modeling it.

The better the question, the worse the answer

Justin Wolfers wrote recently about the level of interaction between economics and other social sciences.  In particular, he wonders why economic work is not well represented in a list of the books most cited in social science research.  It’s a good question: I find many of the tools and techniques developed by economists are useful in my works studying political phenomena, and I do cite economic research.

One particularly thoughtful commenter on Wolfers’ post notes that economics combines the controversy of addressing everyday issues with the general inaccessibility of chemistry.  This conflict may make some people resist the conclusions of economists, ie. strong prior + incomprehensible evidence = small amount of updating.  The comment continues:

However, the inaccessibility of economics does not merely arise through inadvertence. As many jokes attest, economists are not merely unsentimental, they are ANTI-sentimental. An economist will often revel in the opportunity to rub people’s noses in the conclusion that their pre-conceptions are fluffy-headed poppycock. To many people (including some economists, I fear), economics appears to be less a social science than a religion, revealing to a chosen few the mighty counter-intuitive truths by which to pass judgment on a sinful world.

If I’m an accomplished scholar in another discipline, to what extent am I open and receptive to this kind of intellectual upbraiding?

Another commenter notes

My guess is at least part of this effect is reciprocity. Economists are famously bad when it comes to citing relevant research from the other social sciences. And I am not even talking about humanistic research: experiments from social psychology and statistical work from sociology are often ignored when economists do related (or even nearly identical) work. There is a widely held perception in the other social sciences that economists are, at best, disinterested in having interdisciplinary conversations–or, at worst, regularly tolerate cross-disciplinary plagiarism. My guess is that a culture of arrogance is the ultimate cause.

These are legitimate concerns, and there are reasonable rejoinders, but I think there is a more compelling explanation than “economists are a$$holes.”

When we develop a theory, we throw away some of the details about the world in order to make the theory simple enough to understand.  This abstraction is an art: there are no hard rules and too few guidelines on how to make useful theories.

Suppose that there is only one choice: how much detail to discard.  One can (grossly over-)generalize by saying that economists throw away more detail, ignoring things like irrational behavior, while other social scientists throw away less detail.  By simplifying more, the economists gain the ability to apply very technical tools to generate results that are very reliable given the assumptions they made.  By simplifying less, other social scientists are left with questions that more closely resemble reality but are harder to analyze.

Internal validity versus external validity.  Tractability versus verisimilitude.  We’ve heard this song and dance before.  The implied question is something like Heisenberg’s Uncertainty Principle:

Hypothesis:  (time to analyze) * (problems with applicability) >= (some constant)

Does this hold?  Actually, I am optimistic that it doesn’t.  Regardless, we all face choices when trying tell a story that explains something.

You’re Asking the Wrong Question, Fortunately

Today I got up, finishing a decision I started last night about how much to sleep before today.  I will choose my attire to fit the weather and strike the right tone in the classes I will teach. I will go to work and spend the day at work making optimal decisions about how to allocate my time and effort considering my immediate goals, teaching effectively and  preparing for an experiment, and longer term goals like getting along with my peers and building my tenure packet.  I will come home along a route that balances safety, convenience, fuel economy, and curiosity.  I will talk with my wife, play with my daughter, read to my son, all with an eye toward building both their individual lives and my relationships with them.  I may make a few allocation decisions about improving our house or saving for retirement.  I will decide whether to work out tomorrow morning, then  begin the decision about how much to sleep before tomorrow.

I don’t know if I’ll make the right decisions. I cannot know, even afterward, whether I did.  It turns out that it would take a very long time to check.  Constantinos Daskalakis has shown that it would take a computer as complex as the entire universe longer than the lifetime of the universe to solve for Nash equilibria of many common games, like how to invest for retirement. There all I am trying to do is optimize money.  When I read to my son I am doing something harder:  I am trying to influence his education, his confidence, his happiness, and my relationship with him, all without being so bored that I fall asleep.  Most of the decisions we make all day are too hard to solve.

Of course, I can simplify each of these decisions to make them tractable. Any simplifying assumptions I make are wrong, and therefore I am answering the wrong question.  I can make specific, certainly wrong assumptions, I can approximate a bunch of decision characteristics by combining them into one or more stochastic elements in a simpler game.  Either way, I am limiting my rationality, or really acknowledging that my rationality is bounded. When I make one of my decisions I am not answering the question before me but rather a different, simpler, but wrong question.

You might object.  “Once we give up rationality, we give up prediction.  There are no limits to how we can be irrational.”  How does the joke go?  “The difference between genius and stupidity is that genius has its limits.”  There is no limit on the answers we can get if we allow our models of human behavior to admit irrational behavior.  If you allow for heuristics, anything could happen, so we know nothing!

True, anything could happen, but we can still note that some things happen more than other things.  Data can guide us.  We can examine which ways of simplifying are likely and which are unlikely.  Assigning likelihood statements to different behaviors is the statistician’s strategy.  Some steps along this route are taken by quantal response theory. However, there is still much work to be done.

Acknowledging that when we model we always lose essential details, George Box said “All models are false but some models are useful.”  At some point we all have to decide to answer the wrong question, whether in the economy or just in deciding when to go to bed.  Otherwise we wouldn’t be able to get up in the morning.

All Theorists are Normative (or run that risk)

A recent exchange at the excellent Cheap Talk focused on how the uselessness of the United States’ recent promise not to nuke other states who comply with the Nuclear Non-Proliferation Treaty (NPT).  Sandeep Baliga writes

This is an attempt to use a carrot and stick strategy to incentivize countries not to pursue nuclear weapons.  But is it any different from the old strategy of “ambiguity” where all options are left on the table and nothing is clarified?  Elementary game theory suggests the answer is “No”.

We are left with the conclusion that a game theoretic analysis of the Nuclear Posture Review says it seems little different from the old policy of ambiguity.

Baliga raises a simple, clear, and important question:  When we say we are not going to nuke you, what keeps us from doing it anyway?  “[T]he words of the [new policy] are just that – words.”  Baliga seems to imply that there is no reason to make such a promise.

Suppose that the assumptions of the “elementary game theory” employed are all correct.  They are standard assumptions and they have been employed correctly; therefore, our administration has wasted its time.

Let’s look at it from the other direction.  Whether or not the folks who make our policies have studied game theory, are bright enough to look back and notice which things tend to work and which things tend not to. We may have only 6+ decades of experience with nuclear politics, but we have many, many more years of experience with the role of cheap talk in diplomacy.  Apparently, our policymakers seem to think that cheap talk can work, at least enough to be worth the effort of making a statement.  Therefore, our administration did not waste its time.

Barring a logical error, one of three things must be true.

  1. The assumptions of the game theoretic analysis are correct (or close enough to being so,) so the administration is wrong and can learn from the theory.
  2. The assumptions of the game theoretic analysis are wrong (or not close enough,) the administration is correct, and theorists need to update their model.
  3. The assumptions of the model are wrong, but the conclusion is correct and the administration is still wrong.

Setting aside case (3), on to the central question:  Is the theorist being positive, describing the world, or normative, telling us how it should be?  (Both are important, useful roles, and formal theorists can, and I believe should, speak up on normative issues whenever science can help anchor moral choices.)  In case (1) the theorist is describing the world, providing information.  In case (2) the theorist is incorrect, but still earnestly trying to describe the world.  In either case, he is being positive.  However, by asserting or implying that case (2) is not under consideration, he is taking a normative position:  not about the conclusion (that in contexts like this cheap talk is useless) but about the assumptions.  He is saying, for example, that it should be the case that we can ignore audience costs.  Unfortunately, this assumption and other similar ones turn out not to be tenable even in theory.  More importantly, I agree with the administration that promises like this can have a real, positive effect; given this, the assumptions of the model must not be correct.

In my view, it comes down to this:  When a formal theorist derives a behavioral prediction that does not coincide with what people actually do, maybe the model is correct and we can learn from the model, or maybe the model is wrong and the modeler should learn from the world.  Perhaps one’s goal (understanding policy or shaping it) should drive that decision.

If you are confident that your theory explains the relevant situation very well, go ahead and use it to make recommendations.  Just remember Cromwell’s Rule:

I beseech you, in the bowels of Christ, think it possible that you may be mistaken.

Figuring significance significant figures

I recently sat through some great grad student presentations. Most of those presenting empirical results made a common mistake: they kept way too many digits in their presented results.  There are two problems with showing more digits than necessary:  false certainty and lack of clarity.  The extra certainty is certainly false because we know how accurate the estimated coefficients are:  that’s exactly what the standard errors tell us! Extra digits reduce clarity by cluttering up an already hard-to-read table with extra, unnecessary information.

My recommendation

If you took chemistry in high school, you might remember there are some rules for how many significant figures to keep.  For intermediate calculations, keep all figures you have. For final results keep one more than you can justify based on how the measurement was taken.  For example, if you are just recording temperatures and presenting them, and the analog thermometer has markings to the nearest degree, you should estimate and record the temperature to the nearest tenth of a degree.

When reporting the results of statistical analysis in a table, commonly reported quantities are point estimates (the coefficients), standard errors, t-values, and p-values. Some R output might look like this:

            Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)  0.03559     0.10886    0.327     0.744    
x            1.90094     0.09950   19.105    <2e-16 ***

First, trim the standard error to two significant figures.  No, they don’t all have to match; standard errors use the same units as the coefficient, which are \frac{\text{units of }y}{\text{units of }x}.  For different independent variables x_1,x_2,\ldots you have different units and therefore can reasonably have different numbers of digits.  Why two digits? Because all you can really justify for a standard error is one digit, and as with the general rule for final results, we keep one extra.  The difference between a standard error of .1 and .11 is nil as far as the conclusions (inferences) you draw.

While you’re going to two sig figs with the standard error, do the same with t-values and p-values. This gives an intermediate form of

            Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)  0.03559        0.11     0.33      0.74    
x            1.90094        0.10    19.      <2e-16 ***

For the coefficients, keep one digit less than the standard error.  That’s the statistical accuracy of your coefficient.  You might easily have more significant digits in the coefficient than the standard error — certainly, if the coefficient is very much significantly different from zero — or you might have a coefficient that now looks like zero.

            Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)      0.0        0.11     0.33      0.74    
x                1.9        0.10    19.      <2e-16 ***

“But, the intercept looks like it’s exactly zero!”  Statistically, we cannot reject the claim (null hypothesis) that it is zero.

Keeping extra digits in reporting statistical results is unjustified, potentially misleading, and certainly obfuscating.

Computational modeling is not: simulation

Computational modeling and simulation have many similar things in common. They both involve using computers, they both use encoded descriptions of how things work, they both “run” one or (usually) many times.  The easiest way to see how they differ is to note their very different goals.

Simulations are run for the purpose of making specific predictions. The rules can make sense, but this is not necessary.  A simulation is judged by whether or not the predictions are accurate. The results can be very sensitive to all kinds of things.

Computational models are run for the purpose of understanding general principles and, hopefully, generating testable hypotheses. The rules are simple compared to the “real” rules, but must make sense. A computational model is judged by the same standard as any other formal model, by how interesting and general its results are.  As such, the results should be robust to changes in all kinds of things.

A simulation can be based on a thousand equations. Suppose Equation 462 ends in “+ 7″. One might ask, “Why ‘+ 7′?”  An appropriate response for a simulation is “If we change it to some other value, it doesn’t work.”  This is not a good response for a computational model; a computational model that breaks so easily is too fragile, and therefore not general enough to be useful.

Wikipedia gets this right when defining computational modeling, but exactly wrong with two of its examples: weather forecasting and flight simulators.

Different folks use these words differently, and that’s fine. My point is that there is a difference between these two goals (prediction v. understanding) and the difference in goals leads to different choices in the practice of each.

Terrorism is not Lightning or Peanut Butter

I came across the book Panicology, where “Two Statisticians Explain What’s Worth Worrying About (and What’s Not) in the 21st Century”.  The back cover chastens the reader:

Terrorism?

More Americans have been killed by lightning or by peanut allergies than by terrorist attacks.

I’ve read this comparison in different forms many times; it is true, but misleading. The implication is that, because you don’t spend much time defending against lightning strikes, and that is reasonable, you are foolish to spend much time defending against terrorist attacks. Now let’s apply Drake/Backus equation reasoning to lightning and terrorism separately to see why these are not the same.

A model of getting hit by lightning during a given storm might include the following components

A_1 = Pr(lightning hitting me given lightning generated and I’m outside)

A_2 = Pr(lightning hitting me given lightning generated and I’m in the basement)

B_1 = Pr(lightning generated given I’m outside)

B_2 = Pr(lightning generated given I’m in the basement)

C = Pr(I’m outside rather than in the basement)

The probability of me getting hit by lightning is then A_1 \cdot B_1 \cdot C + A_2 \cdot B_2 \cdot (1-C).  Now, I doubt my presence outside or in the basement affects the voltage potential between Earth and sky so much as to influence the frequency of lightning strikes, no matter how much iron is in my diet. This means that B and C are independent, so B_1 = B_2.  Let B = B_1=B_2, and factor out B.  The probability of me getting hit by lightning is B \cdot [A_1 \cdot C + A_2 \cdot (1-C)].  Assuming I’m safer in the basement (A_1 > A_2) I can minimize the chance of getting hit by minimizing C, the probability that I am outside. However, as long as A_1 is very small, C doesn’t matter, so I might as well go outside if I like.

How does this model fare when applied to terrorism? Not as well.  The analogous model of being a terrorist victim on a given flight would include

A_1 = Pr(terrorist successfully affecting me given terrorist attempt and I vote for less airport security)

A_2 = Pr(terrorist successfully affecting me given terrorist attempt and I vote for more airport security)

B_1 = Pr(terrorist attempt given I vote for less airport security)

B_2 = Pr(terrorist attempt given I vote for more airport security)

C = Pr(I vote for less airport security)

Again, the probability of the bad event is A_1 \cdot B_1 \cdot C + A_2 \cdot B_2 \cdot (1-C). However now B_1 \neq B_2; in fact, we expect B_1 > B_2 because the terrorist has a greater chance of success given an attempt:  A_1 > A_2. If we have no airport security then A_1 becomes large, so my choice C really matters.

Terrorism is not lightning.  Terrorists respond to our choices about security; lightning does not respond to my choices about  where I spend the rainstorm. Deciding whether peanut butter responds to one’s choices is left as an exercise for the reader.

The Engineer’s Fallacy

As a mathematician-turned-social-scientist, I have first-hand experience with the traps a physical scientist can fall into when trying to explain how people act and interact. This is the first of many posts in which I will describe my favorite error, which I have come to call “The Engineer’s Fallacy”.  Rather than define it straight away, I will start with a recent example making it’s way around the mediascape.

Peter Backus gained some notoriety when he applied the reasoning of the Drake equation to explaining why he didn’t have a girlfriend. The reasoning in both cases is simple.  What is the fraction of people in the local area of the right age?  What fraction of those are female? What fraction of those are single?  What fraction of those are cute? What fraction of those would find me cute? What fraction of those are out on a given night?  Continue, multiply the fractions together, and you get the probability of meeting a future girlfriend tonight.

What is the hidden assumption that allows us to combine those individual probabilities so easily via multiplication? Independence. Backus assumes that he meets people at random and that people in all these cross-cutting groups (young/old, male/female, married/single) are distributed across town without being at all correlated. Put more simply, he assumes that people don’t, in any meaningful way, associate with similar people. This is enough to make a social network scientist cry.

That’s the theoretical objection, but an empirical objection is stronger evidence.  Suppose Backus’s reasoning is correct, that the chance that he will meet that special person is 0.00034% on a given night.  Continuing with his reasoning, if he goes out both nights of every weekend for about 27 years, that gives him 1% chance of meeting a Miss Right. Over his lifetime he might have a 2% chance of meeting someone. If he is typical, then most other people should have about a 2% chance of meeting someone worth dating at any point in their lifetime.  However, according to Backus, about half the people in the area are married.  Presumably, most of these people dated beforehand, and likely dated more than one person.

Almost as amusing as Backus’s argument is the “encouraging” rejoinder by Diego Trujillo who suggests taking a thermodynamical approach! Trujillo doubles-down on the independence assumption. Applying his reasoning, there is an equal chance of me (1) not having a girlfriend and (2) having [insert famous woman here] as a girlfriend.

Independence is a strong assumption. Social scientists typically are inculcated with more skepticism about it than physical scientists, and for good reason: people are social.