Monday, October 15, 2007

SUDAAN and Collinearity

Tips: SUDAAN and Collinearity
Dear B,

I have another question about SUDAAN. The following article states that -
Q1: Does SUDAAN really control for collinearity by default?
Q2: If not, how do I test it?
Again, I really appreciate your help.
X

Dear X,
Q1: Does SUDAAN really control for collinearity by default?
SUDAAN does not control for collinearity. When betas (coefficients) get too big or too small (e.g. in logistic) SUDAAN will output whatever answer it computes and also issues a warning message about the possible existence of collinear variables. As far as I know, no software available controls for multicollinearity either. There are statistics that you can request in some software (e.g. SAS) to assess multicollinearity and some actions that you can take in order to alleviate the problem.
Q2: If not, how do I test it?
As you may know, multicollinearity in logistic regression (I am assuming that you are working with logistic regression) models is a result of strong correlations between independent variables.
Effect of multicollinearity:
The existence of multicollinearity inflates the variances of the parameter estimates. That may result, particularly for small and moderate sample sizes, in lack of statistical significance of individual independent variables while the overall model may be strongly significant.
Multicollinearity may also result in wrong signs and magnitudes of regression coefficient estimates, and consequently in incorrect conclusions about relationships between independent and dependent variables.
How to detect multicollinearity?
1-Start with examining the correlations (continuous and ordinal variables) and associations (nominal variables) between independent variables.
However, in some situation, when no pair of variables is highly correlated, but several variables are involved in interdependencies, it may not be sufficient.
2-It is better to use multicollinearity diagnostic statistics produced by linear regression analysis (PROC REG with options VIF TOL in SAS). 
For nominal independent variables, create dummy variables for each category except one (it will become a reference category). Use the dependent variable from logistic regression analysis or any other variable that is not one of the independent variables, as a dependent variable in the linear regression. The collinearity diagnostic statistics are based on the independent variables only, so the choice of the dependent variable does not matter.
Examine Tolerance and Variance Inflation Factor for each variable. Since for each independent variable, Tolerance = 1 – Rsq, where Rsq is the coefficient of determination for the regression of that variable on all remaining independent variables, low values indicate high multivariate correlation. 
The Variance Inflation Factor (VIF) is 1/Tolerance, it is always >= 1 and it is the number of times the variance of the corresponding parameter estimate is increased due to multicollinearity as compared to as it would be if there were no multicollinearity. 
There is no formal cutoff value to use with VIF for determining presence of multicollinearity. Values of VIF exceeding 10 are often regarded as indicating multicollinearity, but in weaker models, which is often the case in logistic regression, values above 2.5 may be a cause for concern (Reference: P.D. Allison, Logistic Regression Using the SAS System, SAS Institute).
What to do about multicollinearity?
In some cases, variables involved in multicollinearity can be combined into a single variable. If combining variables does not make sense, then some variables causing multicollinearity need to be dropped from the model.
Examining the correlations between variables and taking into account practical aspects and importance of the variables help in making a decision what variables to drop from the model.
Hope it helps.
Best,
B
  



3 comments:

Unknown said...

you should cite your sources...
i.e. http://www.uky.edu/ComputingCenter/SSTARS/MulticollinearityinLogisticRegression.htm

Yiling J Cheng said...
This comment has been removed by the author.
Yiling J Cheng said...

I just checked the website Jingo posted. I agree that my blog provided by a friend is almost a copy of each other. Thanks for sharing.