Friday, April 20, 2012

quantile regression

Percentile and Quantile Regression for Complex Survey Data

R!
            > library(survey)
            > options(survey.lonely.psu="remove")
            > dclus1<-svydesign(id=~psu_p, strata=~strat_p, weights=~wtfa, nest=TRUE, data=nhis)
            > bclus1<-as.svrepdesign(dclus1,type="bootstrap", replicates=100)
            > withReplicates(bclus1, quote(coef(rq(api00~api99, tau=0.5, weights=.weights))))
  • Survey analysis in R by Thomas Lumley.
  • How to deal with singleton/lonely PSUs (How do I analyze survey data with a stratified design with certainty PSUs?):
    • Default: - options(survey.lonely.psu="fail") -, which makes it an error to have a stratum with a single, non-certainty PSU. 
    • options(survey.lonely.psu="remove") -,and - options(survey.lonely.psu="certainty") -, which can be set after the - library(survey)- mean a single-PSU stratum makes no contribution to the variance.
    • - options(survey.lonely.psu = "adjust") -, which is taking the average of all the strata with more than one PSU. This might be appropriate if the lonely PSUs were due to data missing at random rather than to design deficiencies.
    • Difficulties in estimating variances also arise when only one PSU in a stratum has observations in a particular domain or subpopulation. R gives a warning rather than an error when this occurs, and can optionally apply the "adjust" and "average" corrections. To apply the corrections, set - options(survey.adjust.domain.lonely=TRUE) -, and set - options(survey.lonely.psu="xxx") - to the adjustment method you want to use.
Stata
  • Quantile regression
  • Stata: How do I obtain percentiles for survey data?
  • If only need point estimates of quantiles: we can use "_pctile" (store them in r()), "pctile" (create variables containing percentiles), and "xtile" (create variable containing quantile categories) to get quantiles for survey data. Or use "qreg"
    • webuse nhanes2
    • _pctile height [pw=finalwgt], p(10,90)
      • return list
      • disp "Median of age = " scalar(r(r1))
    • pctile qhgt=height [pw=finalwgt], nq(4) genp(percent)
    • xtile hgtdec=height[pw=finalwgt], nq(10)
    • table sex [pweight= finalwgt] , c(median age count age) row format(%9.0f)
    • qreg height [pw=finalwgt], quantile(10)
    • mi estimate: qreg height, quanitle(10) vsquish
  • If need standard errors and CIs, use user-written command "epctile" ("findit epctile")(or directly from "net from http://members.socket.net/~skolenik/stata/").
    • webuse nhanes2
    • svyset psu [pw=finalwgt], strata(strata)
    • epctile height, p(10) svy
    • epctile height, percentiles(10 20 30 50) subpop(if sex==1) svy
    • Or, see my update (3/3/2017) below
  • Replicate Weights and Bootstrap sampling and estimation.
  • May use bootstrap to get variances of a complex designed sample: -bsweights- (IDEAS)creates the bootstrap weights for designs specified through and supported by svy:
    • bs4rw, rw(brrrwt*): qreg weight i.race if subpop==1 [pw=finalwgt], q(.75)
  • Started from version 13, -qreg- supports -pweight- and -iweight-. The 'bootstrap' has 'strata()' etc. options, even though 'bootstrap' is not designed for the complex survey data, the estimates are very similar to the estimates from the "withReplicates(design=xxx, quote(coef(rq(y ~ x,tau=0.5, weights=.weights))))" of R!.
  • Blog: Doing Bootstrap/Jackknife in Stata for complex survey data
  • Updates(3/3/2017): The better approach is to use 'svy jackknife' or 'svy bootstrap' with jackknife or bootstrap replicate weights. After the version 10, you don't need create jackknife replicate weights using user-written command -survwgt-, the -svy jackknife- can create the weights according to the info provided by -svyset-. You do need provide the bootstrap replicate weights for -svy bootstrap-, which can be created using the user-written command such as -bsweights-:
      • webuse nhanes2, clear
      • svyset psu [pw=finalwgt], strata(strata)
      • bsweights bs_, n(-1) reps(50) seed(4881269)
      • svyset [pw=finalwgt], bsrw(bs_*)
      • svy bootstrap _b: qreg weight i.race, q(.5)
SAS
Articles

1 comment:

Ilias Geo said...

Is it possible to post a solution for multiply imputed data? For example, perform quantile regression, account for multiple imputation and include sampling weights?

Thanks in advance,
Ilias