Confidence Intervals for Medians and Percentiles
Medians are better than means in most interpretation contexts: they’re not affected by skewed or otherwise non-normal distributions. They give a better sense of the “typical” data point. When the mean and median differ, I prefer to use the median.
One problem with using medians is that you can’t calculate a confidence interval for them the same way as you calculate one for a mean. There’s no “standard error of the median”. However, it turns out there is a way to calculate confidence intervals for them.
Let’s be clear about the context. When we calculate a confidence interval for a mean, we’re saying that our data is a sample from some population and that the confidence interval is related to this population mean. Similarly, when we calculate a confidence interval for a median, we’re saying our data is a sample. When there’s a ton of data, a point estimate tells a reasonable story about the population, but when there’s less data, knowing how accurate your estimate is can be important.
I found a nifty bit of math explaining how to calculate them here. I wrote a little R code to implement it here. You can see it here with source for the Shiny app here. Note that you’ll need a test file; here is a small one.
FunFact: The same algorithm allows you to generate a confidence interval for percentages other than the 50th (the median). The code I wrote lets you set the percentile and the desired confidence level.
If you find this useful, let me know!
Just sent this to a former colleague who needs to rely on this for his work. He told me he will compare this to the version I wrote of this in Visual Basic before I caught your blog post to compare. Am looking forward to finding out if the results are similar.