How useful are normality tests?
source: graphpad.com
Many data analysis methods (t test, ANOVA, regression) depend on the assumption that data were sampled from a Gaussian distribution. The best way to evaluate how far your data are from Gaussian is to look at a graph and see if the distribution deviates grossly from a bell-shaped normal distribution.
How useful are statistical methods to test for normality? Less useful than you’d guess. Consider these potential problems:
- Small samples almost always pass a normality test. Normality tests have little power to tell whether or not a small sample of data comes from a Gaussian distribution.
- With large samples, minor deviations from normality may be flagged as statistically significant, even though small deviations from a normal distribution won’t affect the results of a t test or ANOVA.
- Decisions about when to use parametric vs. nonparametric tests should usually be made to cover an entire series of analyses. It is rarely appropriate to make the decision based on a normality test of one data set.
Which normality test is best?
When we added a normality test to Prism several years ago, we selected the one that was best known, the Kolmogorov-Smirnov test. This test compares the cumulative distribution of the data with the expected cumulative Gaussian distribution, and bases its P value on the largest discrepancy.
The Kolmogorov-Smirnov test was designed to compare two experimentally-determined distributions. When testing for normality, you compare an experimental distribution against a hypothetical ideal, so its necessary to apply the Dallal-Wilkinson-Lilliefors correction. Prism versions 4.02 and 4.0b do this, but earlier versions did not.
The Kolmogorov-Smirnov test is based on a simple way to quantify the discrepancy between the observed and expected distributions. It turns out, however, that it is too simple, and doesn't do a good job of discriminating whether or not your data was sampled from a Gaussian distribution. An expert on normality tests, R.B. D’Agostino, makes a very strong statement: “The Kolmogorov-Smirnov test is only a historical curiosity. It should never be used.” (“Tests for Normal Distribution” in Goodness-of-fit Techniques, Marcel Decker, 1986).
In Prism 4.02 and 4.0b, we added two more normality tests to Prism’s column statistics analysis the Shapiro-Wilk normality test and the D’Agostino-Pearson omnibus test.
All three procedures test the same the null hypothesis – that the data are sampled from a Gaussian distribution. The P value answers this question: If the null hypothesis were true, what is the chance of randomly sampling data that deviate as much (or more) from Gaussian as the data we actually collected? The three tests differ in how they quantify the deviation of the actual distribution from a Gaussian distribution.
The Shapiro-Wilk normality test is difficult for nonmathematicians to understand, and it doesn't work well when several values in your data set are the same. In contrast, the D’Agostino-Pearson omnibus test is easy to understand. It first analyzes your data to determine
skewness (to quantify the asymmetry of the distribution) and kurtosis (to quantify the shape of the distribution). It then calculates how far each of these values differs from the value expected with a Gaussian distribution, and computes a single P value from the sum of the squares of these discrepancies. Unlike the Shapiro-Wilk test, this test is not affected if the data contains identical values.
If you to decide use normality tests, I recommend that you stop using the Kolmogorov-Smirnov test, and switch instead to the D’Agostino-Pearson omnibus test. But first rethink whether the normality tests are providing you with useful information.
No comments:
Post a Comment