Wednesday, March 24, 2010

Why Dichotomizing Variables is a Bad Idea
This is a nice review from the research perspective, but may be not so practicable without using dichotomized variable of a continuous variable. For example, the criteria of diabetes and hypertension diagnosis are based on dichotomized variables from continuous blood glucose level and blood pressure level.

_From Bob

        It is well recognized in the methodological literature that dichotomization of continuous variables introduces major problems in the analysis and interpretation of models derived in a data-dependent fashion. Nevertheless, dichotomization of continuous variables is widespread in clinical research. Problems include loss of information, reduction in power, uncertainty in defining the cutpoint, arriving at a biologically implausible step function as the estimate of a dose–response function, and the impossibility of detecting a non-monotonic dose–response relation. Uncertainty in how to select a ‘sensible’ cutpoint to group a continuous variable into two classes has led researchers to use either the median or an ‘optimal’ cutpoint. The latter approach gives a highly inflated type 1 error probability, together with biased parameter estimates and variances that are too small [9, 11]. Although some remedies for these diffculties have been developed [9, 21–23], none of the authors of these papers actually recommends the use of ‘optimal’ cutpoints with their proposed corrections. In general, the situation seems hardly to have improved since the advice in 1993 of Maxwell and Delaney [1] to avoid dichotomization, quoted at the beginning of this paper.
Instead of dichotomizing a continuous variable, we prefer to obtain a prognostic index by methodology which combines selection of variables with selection of functions for continuous variables [4, 26]. As stated in an editorial [2] in an epidemiological journal a decade ago, ‘these elegant approaches [fractional polynomials and splines] merit a larger role in epidemiology.’ Clinical researchers should in general avoid dichotomization at the model-building stage and adopt more powerful methods.

Royston, Patrick (2006) Statistics in Medicine 25:127-141

No comments: