Archive for the ‘Statistics’ Category.
February 16, 2016, 3:06 pm
Medians are better than means in most interpretation contexts: they’re not affected by skewed or otherwise non-normal distributions. They give a better sense of the “typical” data point. When the mean and median differ, I prefer to use the median.
One problem with using medians is that you can’t calculate a confidence interval for them the same way as you calculate one for a mean. There’s no “standard error of the median”. However, it turns out there is a way to calculate confidence intervals for them. Continue reading ‘Confidence Intervals for Medians and Percentiles’ »
February 18, 2015, 10:23 pm
Stopping your A/B test once you reach significance is a great way to find bogus results…if you’re a frequentist. Checking before you have the statistical power to detect the phenomenon will often lead to false positives if you rely on classical/frequentist methods. A Bayesian with an informative null-result prior can avoid these problems. Let’s think about why. Continue reading ‘How often can Thomas Bayes check the results of his A/B test?’ »
May 15, 2014, 12:44 pm
A good A/B test tool should be able to reach the following conclusions:
- A beat B or B beat A, so you can stop.
- Neither A nor B beat the other, so you can stop.
- We can’t conclude #1 or #2 but you’ll need about m more data points to conclude one of them.
The tools I’ve found for analyzing A/B tests can all answer #1. Some of the better ones can answer #3. None of the tools I’ve seen will answer #2 and tell you that A and B are not meaningfully different and that you have enough data to be pretty sure about that. Continue reading ‘When Enough is Enough with your A/B Test’ »
March 3, 2011, 5:09 pm
What is a methods-careful practitioner to do when the number of observations (
) is small? I don’t know how many times I’ve been told by a well-meaning Bayesian some variation of
Bayesian estimation addresses the “small
problem”
This is right and wrong. Continue reading ‘Bayes fixes small n, doesn’t it?’ »
March 10, 2010, 1:51 pm
I recently sat through some great grad student presentations. Most of those presenting empirical results made a common mistake: they kept way too many digits in their presented results. There are two problems with showing more digits than necessary: false certainty and lack of clarity. The extra certainty is certainly false because we know how accurate the estimated coefficients are: that’s exactly what the standard errors tell us! Extra digits reduce clarity by cluttering up an already hard-to-read table with extra, unnecessary information.
Continue reading ‘Figuring significance significant figures’ »