Figuring significance significant figures

I recently sat through some great grad student presentations. Most of those presenting empirical results made a common mistake: they kept way too many digits in their presented results.  There are two problems with showing more digits than necessary:  false certainty and lack of clarity.  The extra certainty is certainly false because we know how accurate the estimated coefficients are:  that’s exactly what the standard errors tell us! Extra digits reduce clarity by cluttering up an already hard-to-read table with extra, unnecessary information.

My recommendation

If you took chemistry in high school, you might remember there are some rules for how many significant figures to keep.  For intermediate calculations, keep all figures you have. For final results keep one more than you can justify based on how the measurement was taken.  For example, if you are just recording temperatures and presenting them, and the analog thermometer has markings to the nearest degree, you should estimate and record the temperature to the nearest tenth of a degree.

When reporting the results of statistical analysis in a table, commonly reported quantities are point estimates (the coefficients), standard errors, t-values, and p-values. Some R output might look like this:

            Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)  0.03559     0.10886    0.327     0.744    
x            1.90094     0.09950   19.105    <2e-16 ***

First, trim the standard error to two significant figures.  No, they don’t all have to match; standard errors use the same units as the coefficient, which are \frac{\text{units of }y}{\text{units of }x}.  For different independent variables x_1,x_2,\ldots you have different units and therefore can reasonably have different numbers of digits.  Why two digits? Because all you can really justify for a standard error is one digit, and as with the general rule for final results, we keep one extra.  The difference between a standard error of .1 and .11 is nil as far as the conclusions (inferences) you draw.

While you’re going to two sig figs with the standard error, do the same with t-values and p-values. This gives an intermediate form of

            Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)  0.03559        0.11     0.33      0.74    
x            1.90094        0.10    19.      <2e-16 ***

For the coefficients, keep one digit less than the standard error.  That’s the statistical accuracy of your coefficient.  You might easily have more significant digits in the coefficient than the standard error — certainly, if the coefficient is very much significantly different from zero — or you might have a coefficient that now looks like zero.

            Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)      0.0        0.11     0.33      0.74    
x                1.9        0.10    19.      <2e-16 ***

“But, the intercept looks like it’s exactly zero!”  Statistically, we cannot reject the claim (null hypothesis) that it is zero.

Keeping extra digits in reporting statistical results is unjustified, potentially misleading, and certainly obfuscating.

Leave a Reply