Confidence Intervals for Medians and Percentiles

Medians are better than means in most interpretation contexts: they’re not affected by skewed or otherwise non-normal distributions. They give a better sense of the “typical” data point. When the mean and median differ, I prefer to use the median.

One problem with using medians is that you can’t calculate a confidence interval for them the same way as you calculate one for a mean. There’s no “standard error of the median”. However, it turns out there is a way to calculate confidence intervals for them.

Let’s be clear about the context. When we calculate a confidence interval for a mean, we’re saying that our data is a sample from some population and that the confidence interval is related to this population mean. Similarly, when we calculate a confidence interval for a median, we’re saying our data is a sample.  When there’s a ton of data, a point estimate tells a reasonable story about the population, but when there’s less data, knowing how accurate your estimate is can be important.

I found a nifty bit of math explaining how to calculate them here.  I wrote a little R code to implement it hereYou can see it here with source for the Shiny app here.  Note that you’ll need a test file; here is a small one.

FunFact:  The same algorithm allows you to generate a confidence interval for percentages other than the 50th (the median).  The code I wrote lets you set the percentile and the desired confidence level.

If you find this useful, let me know!

Leave a Reply