BACKGROUND: Body mass index (BMI), waist circumference (WC), and the waist-stature ratio (WSR) are considered to be possible proxies for adiposity. OBJECTIVE: The objective was to investigate the relations between BMI, WC, WSR, and percentage body fat (measured by dual-energy X-ray absorptiometry) in adults in a large nationally representative US population sample from the National Health and Nutrition Examination Survey (NHANES). DESIGN: BMI, WC, and WSR were compared with percentage body fat in a sample of 12,901 adults. RESULTS: WC, WSR, and BMI were significantly more correlated with each other than with percentage body fat (P < 0.0001 for all sex-age groups). Percentage body fat tended to be significantly more correlated with WC than with BMI in men but significantly more correlated with BMI than with WC in women (P < 0.0001 except in the oldest age group). WSR tended to be slightly more correlated with percentage body fat than was WC. Percentile values of BMI, WC, and WSR are shown that correspond to percentiles of percentage body fat increments of 5 percentage points. More than 90% of the sample could be categorized to within one category of percentage body fat by each measure. CONCLUSIONS: BMI, WC, and WSR perform similarly as indicators of body fatness and are more closely related to each other than with percentage body fat. These variables may be an inaccurate measure of percentage body fat for an individual, but they correspond fairly well overall with percentage body fat within sex-age groups and distinguish categories of percentage body fat.
Disclaimer: This blog site is intended solely for sharing of information. Comments are warmly welcome, but I make no warranties regarding the quality, content, completeness, suitability, adequacy, sequence, or accuracy of the information.
Friday, January 16, 2009
Thursday, January 15, 2009
by Martin Bland
In the study of measurement error, we sometimes find that the within-subject variation is not uniform but is proportional to the magnitude of the measurement. It is natural to estimate it in terms of the ratio within-subject standard deviation/mean, which we call the within-subject coefficient of variation.
In our British Medical Journal Statistics Note on the subject, Measurement error proportional to the mean, Doug Altman and I described how to calculate this using a logarithmic method. We take logarithms of the data and then find the within-subject standard deviation. We take the antilog of this and subtract one to get the coefficient of variation.
Alvine Bissery, statistician at the Centre d'Investigations Cliniques, Hôpital européen Georges Pompidou, Paris, pointed out that some authors suggest a more direct approach. We find the coefficient of variation for each subject separately, square these, find their mean, and take the square root of this mean. We can call this the root mean square approach. She asked what difference there is between these two methods.
In practice, there is very little difference between these two ways of estimating within-subject coefficient of variation. They give very similar estimates.
This simulation, done in Stata, shows what happens. (The function invnorm(uniform()) gives a standard Normal random variable.)
. clear
Set sample size to 100.
. set obs 100
obs was 0, now 100
We generate true values for the variable whose measurement we are simulating.
. gen t=6+invnorm(uniform())
We generate measurements x and y, with error proportional to the true value.
. gen x = t + invnorm(uniform())*t/20
. gen y = t + invnorm(uniform())*t/20
Calculate the within-subject variance for the natural scale values. (Within-subject variance is given by difference squared over 2 when we have pairs of subjects.)
. gen s2 = (x-y)^2/2
Calculate subject mean and s squared / mean squared, i.e. CV squared.
. gen m=(x+y)/2
. gen s2m2=s2/m^2
Calculate mean of s squared / mean squared.
. sum s2m2
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
s2m2 100 .0021519 .0030943 4.47e-07 .0166771
The within-subject CV is the square root of the mean of s squared / mean squared:
. disp sqrt(.0021519)
.04638858
Hence the within-subject CV is estimated to be 0.046 or 4.6%.
Now the log method. First we log transform.
. gen lx=log(x)
. gen ly=log(y)
Calculate the within-subject variance for the log values.
. gen s2l = (lx-ly)^2/2
. sum s2l
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
s2l 100 .0021566 .003106 4.46e-07 .0167704
The within-subject standard deviation on the log scale is the square root of the mean within-subject variance. The CV is the antilog (exponent since we are using natural logarithms) minus one.
. disp exp(sqrt(.0021566))-1
.04753439
Hence the within-subject CV is estimated to be 0.048 or 4.8%. Compare this with the direct estimate, which was 4.6%. The two estimates are almost the same.
If we average the CV estimated for each subject, rather than their squares, we do not get the same answer.
Calculate subject CV and find the mean.
. gen cv=sqrt(s2)/m
. sum cv
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
cv 100 .0361173 .0292567 .0006682 .1291399
This gives us the within-subject CV estimate = 0.036 or 3.6%. This is considerably smaller than the estimates by the root mean square method or the log method. The mean CV is not such a good estimate and we should avoid it.
Sometimes researchers estimate the within-subject CV using the mean and within-subject standard deviation for the whole data set. They estimate the within-subject standard deviation in the usual way, as if it were a constant. They then divide this by the mean of all the observations to give a CV. This appears to be a completely wrong approach, as it estimates a single value for a varying quantity. However, it often works remarkably well, though why it does I do not know. It works in this simulation:
. sum x y s2
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
x 100 6.097301 1.012154 3.62283 8.696612
y 100 6.081827 1.000043 3.759932 8.447584
s2 100 .0823188 .1212132 .0000193 .605556
The within-subject standard deviation is the square root of the mean of s2 and the overall mean is the average of the X mean and the Y mean. Hence the estimate of the within-subject CV is:
. disp sqrt(.0823188)/( (6.097301 + 6.081827)/2)
.04711545
So this method gives the estimated within-subject CV as 0.047 or 4.7%. This can be compared to the estimates by the root mean squared CV and the log methods, which were 4.6% and 4.8%. Why this should be I do not know, but it works. I do not know whether it would work in all cases, so I do not recommend it.
We can find confidence intervals quite easily for estimates by either the root mean square method or the log method. For the root mean square method, this is very direct. We have the mean of the squared CV, so we use the usual confidence interval for a mean on this, then take the square root.
. sum s2m2
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
s2m2 100 .0021519 .0030943 4.47e-07 .0166771
The standard error is the standard deviation of the CVs divided by the square root of the sample size.
. disp .0030943/sqrt(100)
.00030943
The 95% confidence interval for the squared CV can be found by the mean minus or plus 1.96 standard errors. If the sample is small we should use the t distribution here. However, the squared CVs are unlikely to be Normal, so the CI will still be very approximate.
. disp .0021519 - 1.96*.00030943
.00154542
. disp .0021519 + 1.96*.00030943
.00275838
The square roots of these limits give the 95% confidence interval for the CV.
disp sqrt(.00154542)
.03931183
. disp sqrt(.00275838)
.05252028
Hence the 95% confidence interval for the within-subject CV by the root mean square method is 0.039 to 0.053, or 3.9% to 5.3%.
For the log method, we can find a confidence interval for the within-subject standard deviation on the log scale. The standard error is sw/root(2n(m-1)), where sw is the within-subject standard deviation, n is the number of subjects, and m is the number of observations per subject.
In the simulation, sw = root(0.0021566) = 0.0464392, n = 100, and m = 2.
Hence the standard error is 0.0464392/root(2 * 100 * (2-1)) = 0.0032837.
The 95% confidence interval is 0.0464392 - 1.96*0.0032837 = 0.0400031 to 0.0464392 + 1.96*0.0032837 = 0.0528753.
Finally, we antilog these limits and subtract one to give confidence limits for the CV: exp(0.0400031)-1 = 0.040814 and exp(0.0528753)-1 = 0.05429817, so the 95% confidence interval for the within-subject CV is 0.041 to 0.053, or 4.1% to 5.3%. These are slightly narrower than the root mean square confidence limits, but very similar.
I would conclude that either the root mean square method or the log method can be used.
2009 Diabetes Clinical Practice Recommendations
http://care.diabetesjournals.org/content/vol32/Supplement_1/
Monday, January 12, 2009
Friday, January 09, 2009
Thursday, January 08, 2009
http://circ.ahajournals.org/cgi/content/full/107/3/499#SEC6
Markers of Inflammation and Cardiovascular Disease Application to Clinical and Public Health Practice: A Statement for Healthcare Professionals From the Centers for Disease Control and Prevention and the American Heart Association
In 1998, the American Heart Association convened Prevention Conference V to examine strategies for the identification of high-risk patients who need primary prevention. Among the strategies discussed was the measurement of markers of inflammation.1 The Conference concluded that "many of these markers (including inflammatory markers) are not yet considered applicable for routine risk assessment because of: (1) lack of measurement standardization, (2) lack of consistency in epidemiological findings from prospective studies with endpoints, and (3) lack of evidence that the novel marker adds to risk prediction over and above that already achievable through the use of established risk factors." The National Cholesterol Education Program Adult Treatment Panel III Guidelines identified these markers as emerging risk factors,1a which could be used as an optional risk factor measurement to adjust estimates of absolute risk obtained using standard risk factors. Since these publications, a large number of peer-reviewed scientific reports have been published relating inflammatory markers to cardiovascular disease (CVD). Several commercial assays for inflammatory markers have become available. As a consequence of the expanding research base and availability of assays, the number of inflammatory marker tests ordered by clinicians for CVD risk prediction has grown rapidly. Despite this, there has been no consensus from professional societies or governmental agencies as to how these assays of markers of inflammation should be used in clinical practice.
How large of SE/Variance is too large?
A relative standard error (RSE) greater than 30% was used to identify unreliable estimates. The RSE is defined as the ratio of the standard error of the estimate divided by the estimate multiplied by 100 [RSE = 100 x SE(b) / |b|, which is similar to Coefficient of Variation (CV) = SD(b) / |b|, ].
- Klein (2002): Healthy People 2010 Criteria for Data Suppression (pdf)
- Parker(2017): National Center for Health Statistics data presentation standards for proportions (on page 3: relative CI width calculation and 130% cut-point)
National counts or estimates determined to be unstable are indicated with a footnote in the tables. Fatal injuries were identified as unstable if the number of deaths was <20 or the coefficient of variation (CV) was >30%, where CV = (SE / number of deaths) × 100. Nonfatal injuries were identified as unstable if the national estimate was <1,200, the number of sample cases used was <20, or CV was >30%, where CV = (SE / national estimate) × 100.
Why are rates based on fewer than 20 cases marked as being unreliable?
by NY State Department of Health: Data Sources and Tools - Chronic Diseases and Conditions
Example of a NHIS article (Variance Estimation and Significance Testing)
"... Standard errors are shown for all percentages in the tables (but not for the frequencies). Estimates with relative standard errors (RSE) of greater than 30% and less than or equal to 50% are considered statistically unreliable and are indicated with an asterisk (*). Estimates with a relative standard error greater than 50% are indicated with a dagger (†) and the estimates are not shown..."
More about reliability
- Singh (2004) "A generalization of the Coefficient of variation with application to suppression of imprecise estimates "
- The National Electronic Injury Surveillance System "A tool for research"
- Precision of measurement by A New View of Statistics
- CV by Wikipedia
- Assessing Product Reliability by Engineering Statistics Handbook
- Reliability by the Research Methods Knowledge Base
Wednesday, January 07, 2009
Comments on : Women may get diabetes earlier than men
Comments on : Women may get diabetes earlier than men
O-o-m-m. This conclusion must base on some assumptions, such as the definition of pre-diabetes is correct, pre-diabetes must move into diabetes, the normal level of these index are exactly same between women and men (no gender difference?), incubation from pre-diabetes to clinic diabetes are exactly same between women and men, etc. In light of my poor memory, there are very solid evidences that women are living longer than men generally, women have less heart disease risk than men. I don't have any evidences to doubt their findings (actually these are very interesting findings), but I doubt the simple conclusion.
Just some thoughts.
_____________________________________________
Subject: Chicago Health: Women may get diabetes earlier than men
January 7, 2009
Chicago Health
Women May Get Diabetes Earlier Than Men
http://www.nbcchicago.com/health/topics/Women_May_Get_Diabetes_Earlier_Than_Men.html
Women may show signs of diabetes far earlier than men, according to new research. The findings could lead to new diabetes screening procedures to help identify who is at greatest risk of developing the disease.
Researchers from the University of Buffalo studied newly identified risk factors for type 2 diabetes, a disease of metabolism in which the body produces insufficient amounts or fails to use insulin, the hormone needed for cells to process glucose. This leads to a buildup of glucose in the bloodstream. Type 2 diabetes increases the risk of developing heart disease, stroke, eye and kidney diseases and other chronic illnesses.
According to the National Institutes of Health, an estimated 20.8 million Americans—7 percent of the population—had diabetes in 2005, 6.2 million of them undiagnosed. Of the diabetic population, an estimated 90 to 95 percent had the most common form of the disease, type 2 diabetes. In addition, government estimates indicate that at least 43 million Americans have prediabetes, a condition that occurs when blood glucose levels are high but not high enough to be classified as diabetes.
Recent research has shown that levels of chronic sub-acute inflammation, blood clotting factors and dysfunction in the cells lining the inside of arteries may be indicators of diabetes risk factors when tested in the blood.
The Buffalo researchers looked at 1,455 healthy men and women who participated in the Western New York Study between 1996 and 2001. That study tracked alcohol consumption and risk factors for cardiovascular disease. Participants were disease free with no indications of diabetes. They were tested and given physical examinations at the outset and during a six-year follow-up period. The researchers re-examined the participants between 2002 and 2004 and compared the new blood tests to results from 1996-2001.
The blood tests included fasting glucose and insulin levels, C-reactive protein, proinflammatory markers and markers for dysfunction of endothelial tissue lining blood vessels. C-reactive protein (CRP) is a substance produced by the liver that increases whenever there is inflammation in the body. CRP levels rise whenever there is an immune system response or activation. Women in the study had a higher incidence of prediabetes than men. Researchers could not explain why the differences occurred and said more studies are needed.
"Because these pre-diabetic markers are not routinely assessed and because diabetes is strongly linked with coronary heart disease, the study may help explain why the decline in death rates for heart disease in diabetic women lags behind that of diabetic men," lead author Dr. Richard Donahue said in a press release.
Donahue added: "Previous research had shown that hypertension and cholesterol were elevated among women who later developed diabetes. However, current findings that these novel risk factors [markers of endothelial dysfunction, chronic sub-acute inflammation and blood clotting factors] are elevated among women even earlier than previously recognized does suggest that the 'diabetic clock' starts ticking sooner for women than for men."
He suggested that women whose blood glucose levels increase over time should perhaps be screened more intensively for cardiovascular disease.
Saturday, January 03, 2009
- eBooks
- CDC: Public Health 101 Series
- CDC: Crisis & Emergency Risk Communication (CERC)
- UCLA Statistic Computing is a valuable resource for learning statistics and statistic software such as SAS, Stata, and R!.
- Statistic Solutions has some succinct explanation of statistical approaches included Factor Analysis & SEM
- Zuur (2009). A Protocol for data exploration to avoid common statistical problems
- Age-Standardization and Age-Adjustment
- CDC. 2000 projected US population weight and distribution pattern
- SEER. Standard population for age-adjustment
- Institute of Medicine. The future of public health (1988), The future of public's health in the 21st century (2003)
- Smelser (2001): International Encyclopedia of the Social & Behavioral Sciences [1st (2001)] [2nd (2015)]
- Lohr (2001): Sample Surveys: Model-based Approaches
- Dr. David Kleinbaum published his ActivEpi in 2001, which is now online for free (ActivEpi website)
- Majid Ezzati (2006):Global Burden of disease and Risk Factors
- What is epidemiology? and Jokes.
- Interpretation of relative risk
- Poisson regression and count outcome
- Poisson regression and related
- How to calculate confidence interval of incidence rate under the Poisson distribution
- How to get predicted incidence rate using -poisson- of Stata
- How large of SE is too large?
- Walker (2016): A Guide to Section 508 Compliance Using SAS® 9.4 ODS
- Gordon (2014): An exercise in non-linear modeling
- Complex Sampling Survey
- Allen Downey (2014): Think Stats using Python
- Grinstead: Introduction to Probability
- Michael Lavine (2013): Introduction to Statistical Thought
- GRADE website: GRADE guidelines (Grades of Recommendation, Assessment, Development, and Evaluation) (2011)
- CONSORT website- Transparent Reporting of Trials: Guidelines for Reporting Observational & RCT Studies and Flow Diagram (2010)
- STROBE website: STROBE: The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (2007)
- Clinical Analyte Unit Conversion - Jay Clinic Service.
- OpenEpi provides statistics for counts and measurements in descriptive and analytic studies, stratified analysis with exact confidence limits, matched pair and person-time analysis, sample size and power calculations, random numbers, sensitivity, specificity and other evaluation statistics, R x C tables, chi-square for dose-response, and links to other useful sites.
- Statistical literacy
- Chart Chooser — the favorite tool for improved Excel and PowerPoint charts. there is R! version of Chart Chooser (not many charts on the site, but the idea is great)
- Jon's Excel Charts and Tutorials - Peltier Tech
- Stats + Stories: The statistics behind the stories and the stories behind the statistics.
- Ann Emery: Data visualization blogs
- Broman (2017): Data organization in spreadsheets
- Vincent Granville (2014): 10 types of regressions. Which one to use?
- Moderator vs mediator
- Wikipedia: Mediation, Moderation
- David Kenny: Moderator, Mediator
- Baron & Kenny (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182.
- inference (2018).ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus. Interview of Judea Pearl.To Build Truly Intelligent Machines, Teach Them Cause and Effect
- Distribution (Probability, CDF, and Quantile)
- Wicklin (2018). Fit a distribution from quantiles (SAS)
- Tony Hey (2009): The Fourth Paradigm Data-Intensive Scientific Discovery (Microsoft/Publications)
- Yanir Seroussi: Causal Inference reading List
- Missing Imputation
- Blog: Multiple Imputation
- Allison (2014). Sensitivity analysis for not missing at random.
- Stata: Yulia Marchenko (2011). Chained equations and more in multipleimputation in Stata 12
- Survey data imputation
- Wells (2018): Approaches to imputing missing data in complex survey data
- Mukhopadhyay (2016): Survey Data Imputation with PROC SURVEYIMPUTE (Video)
- Resampling and Monte Carlo Simulation
- Latent Class Analysis
- Christopher Baum (2016): Introduction to SEM in Stata
- Jones (2012): A Stata plugin for estimating group-based trajectory models (traj)
- Curran-Bauer(2016): Introduction to Growth Curve Modeling: An Overview and Recommendations for Practice
- Nagin (1999): Analyzing developmental trajectories: a semiparametric, group-based approach
- Training Course
- Linear and nonlinear function/relationships/regression
- Khan Academy: Linear and nonlinear functions (1, 2), Exploring nonlinear relationships
- Richard Williams: Nonlinear relationships, Stata highlights
- Minitab: What Is the Difference between Linear and Nonlinear Equations in Regression Analysis?; Linear or Nonlinear Regression? That Is the Question; Curve Fitting with Linear and Nonlinear Regression
- UCLA: Nonlinear Regression in SAS; Nonlinear or Linear Model
- PennState: Logistic, Poisson, and Nonlinear Regression
- datascience+: First steps with Non-Linear Regression in R!
- StackExchange: How to tell the difference between linear and non-linear regression models?
- StatisticsSolutions: Nonlinear regression
- Wikipedia: Linear function; Nonlinear system; Linear regression; nonlinear regression
- Ruckstuhl: Introduction to Nonlinear Regression
- Motulsky (2016): Fitting curves to data using nonlinear regression
- Haan: What are nonlinear regression functions?
- Brannick: Curvilinear Regression
- Wicklin (2018). Solve a system of nonlinear equations with SAS
- Wicklin (2018). Fit a growth curve in SAS
- Trend analysis
- Blog: Trend Analysis
- NIH.Joinpoint (software)
- Bayesian
- Kruschke (2013).Bayesian estimation supersedes the t test
- bayestestR: Become a Bayesian master you will
- McElreath: Statistical Rethinking: Bayesian statistics using R & Stan open access online.
- Scott Cunningham: Causal Inference: the Mixtape (using Stata)
- Stata
- Blog: Stata - my first Stata program
- Stata Online Help and Document
- The Stata Journal
- Tips of Stata
- Dickman. Estimating and modelling relative survival using Stata (strs, stnet)
- Drukker. Programming an estimation command in Stata: a map to posted entries
- Sribney. How can I estimate correlations and their level of significance with survey data
- margins - undocumented and under documented
- margins, gen() creates variables with predictions for each observation
- margins, at(varnm=gen(exp)) generates values for making predictions
- margins dis, at(age=gen(age)) gives average prediction by dis at observed age, which is equal to margins dis
- margins, at(age=gen(age+1)) gives average prediction by dis at observed age plus 1, which is equal to: .replace age=age+1 .margins dis
- margins dis, at(age=gen(age) age=gen(age+1))
- Average prediction at observed plus standard deviation: .sum age .local sd=r(sd) .margins dis, at(age=gen(bmi+`sd'))
- Princeton University. Online Stata Tutorial at DSS
- Tiberlake
- Bayesian analysis in Stata 15
- Stata Tips #7 - dyntext, dyndoc and user-written commands (version 15)
- Stata Tips #8 spatial analysis in Stata 15 (version 15)
- Herrera (2017). Spatial econometrics methods using Stata
- Kondo (2015) Hot and cold spot analysis using Stata (The Stata Journal)
- Pisati (2010). Exploratory spatial data analysis using Stata
- How to estimate intraclass correlation with survey data (VIF)? (Link1, Link2)
- use "correlate" with aweight (it is equivalent to pweight) for point estimates of the correlation coefficient.
- use "svy: regress" for p-values. Do "svy: regress y x" and "svy:regress x y" and take the biggest p-value, which is the conservative thing to do.
- You might try the "corr_svy" statement which a module to compute correlation tables for survey data. It's based on the Sribney's procedures mentioned above.
- Or, you can get the correlation coefficient using "svy: regress y x", then "disp sqrt(e(r2))" to show coefficient (here e(r2) has squared R value. You can also calculated tolerance using "disp 1-e(r2)" and VIF (variance inflation factor) using "disp 1/(1-e(r2))" and , the A rule of thumb is that if VIF>10 then you need examine multicollinearity further.
- Alternative for VIF calculation: "regress y x z [pw=srvyweight]", then "estat vif"
- Cross Validation
- Trevor Hastie's The Elementary of Statistical Learning is a good and free book for more information. Thanks to Hastie.
- k-fold cross validation
- Stata: user written program crossfold (help file).
- R: Petr Keil (2013): AIC & BIC vs. Crossvalidation using R!.
- SAS: Using Validation and Cross Validation using PROC GLMSELECT.
- Deming: Cross Validation Using SAS
- Net reclassification improvement (NRI) Pencina (2011): Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers (an example of application of cross-validation).
- Data Visualization
- Survival Analysis
- Stephen Jenkins: Survival Analysis with Stata (U of Essex)
- Princeton German Rodriguez: Survival Analysis Pop 509 course notes
- Roberto Gutierrez: On Frailty Models in Stata (used the same dataset (bc.dta) by Jenkins' course)
- Austin (2017): A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications
- How to split single observation into multiple observations by event time (Lexis Diagram): Stata - stsplit, R! - survival::survSplit and Epi::Lexis, SAS - Lexis.sas (pdf)
- Multilevel and Small Area Estimation (SAE) Analysis
- Princeton German Rodriguez: Multilevel Models Pop 510 course notes
- NIH (2000). Progress and promise in research on social and cultural dimensions of health - A research agenda (Video)
- University of Bristol. Centre for Multilevel Modelling
- Joop Hox (The author of Multilevel Analysis)'s Homepage has papers, programs and lectures to download.
- Rabe-Hesketh (2006). Multilevel modelling of complex survey data (Slides 2007)
- Multilevel models for complex survey data - The slides/articles of tutorial at 2011 BRFSS conference
- Paul Allison (2017).Using "Between-Within" models to estimate contextual effects
- Suchindran.Sampling weights and Regression Analysis
- Zaccarin (2008). The effects of sampling weights in multilevel analysis of PISA data. (Slides - Kiel 2009)
- Carle (2009). Fitting multilevel models in complex survey data with design weights: Recommendations
- D’Agostino (SASGF 2013). Multilevel Reweighted Regression Models to Estimate County-Level Racial Health Disparities Using PROC GLIMMIX
- Chantala. Software to Compute Sampling Weights for Multilevel Analysis
- UCLA papers on multilevel modeling
- Effects of Multicollinearity on Completed Model
- Suzuki (2012). Clarifying the use of aggregated exposures in multilevel models. In this article discussed the multicollinearity issue between an individual variable and the aggregated variable.
- Feaster (2011). Multilevel models to identify contextual effects on individual group member outcomes
- Andr ́es Guti ́errez. Small Area Estimation 101
- Sun (2015) Analysis of spatial and temporal data
- Sptatiotemporal analysis
- Skrondal (2009).Prediction in multilevel generalized linear models
- SAS
- PROC GLIMMIX (pdf v13.1) (SAS Documentation)
- Example 43.18 Weighted Multilevel Model for Survey Data (v13.1)
- Smith (2012). SAS Proc GLIMMIX for spatial analysis
- Kiernan (2012). Tips and Strategies for Mixed Modeling with SAS/STAT® Procedures
- Predictive Modeling Tips | Free Best Practices Guide
- Stata
- Multilevel mixed-effects models
- me: multilevel models to fit random-intercept and random-slope models.
- xt: Random-effects panael-data estimators
- Multilevel linear models in Stata, part 1: Components of variance (xt, Video)
- Multilevel linear models in Stata, part 2: Longitudinal data (xt, Video)
- Multilevel generalized linear models (me, Video)
- User-written program: gllamm
- Huber (2014). How to simulate multilevel/longitudinal data
- Rabe-Hesketh (2008). Prediciton in multilevel logistic regression
- Blogs etc.
- Statistic notes in the BMJ
- Statistical Horizons Blog (Paul Allison)
- Rick Wicklin Blog (SAS)
- R Bloggers (R!)
- The Stata Blog (Stata)
- Bendix Carstensen (R! and Age-Period-Cohort analysis)
- medRxiv.org: the preprint server for health sciences
- arXiv.org: Open access to e-prints in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics
- Allison (2018). For causal analysis of competing risks, don't use Fine & Gray's subdistribution method
- Kevin Markham (2015). Should you teach Python or R for data science?
- Allison (2014). Prediction vs. Causation in Regression Analysis
- Norm Matloff (2014). Why are we still teaching T-tests?
- Rick Wicklin (2014). How to choose colors for maps and heat maps.
- Nathan Yau. How to Visualize and Compare Distribution
- Daniel Lakens: The 20% Statistician
- Estimating the covariance of the means from two samples?
- Jonas Kristoffer Lindelov: Coommon statistical tests are linear models (Chinese version)
- Rodriguez. Generalized linear models - 7.4 the piecewise exponential model
- Statistical Reflections of a Medical Doctor (2012). Survival Analysis via Hazard Based Modeling and Generalized Linear Models
- How to do bootstrapping/Jackknife using Stata?
- Journal: Significance communicates and demonstrates statistic practice in an entertaining, thought-provoking and non-technical way.
- Journal: Statistic Science has some good philosophic articles and interviews with experts.
- Journal: Survey Statistician
- Public-accessible Health-related Datasets
- Google Public Data Explorer The Google Public Data Explorer makes large datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don't have to be a data expert to navigate between different views, make your own comparisons, and share your findings.
- How to Estimate Percentiles and Confidence Intervals
- Timeline of Statistics (pdf)
- Distinguished Lectures of The Joint Program in Survey Methodology, University of Maryland
- Sorensen:The Use and Misuse of the Coefficient of Variation in Organizational Demography Research
- Predict a value and estimate the variance of a single response instead of average response, individual vs. marginal, standard error of prediction (Engineering Statistics Handbook, SAS JMP, StackExchange, R-bloggers, Martha Smith) marginal distribution (Statistics How To, Jason Brownlee, StackExchange, Khan Academy)
- UCLA: Delta Method in R!
- Video Clip/Webinar
- Hans Rosling shows the best stats you've ever seen
- David McCandless: The beauty of data visualization
- Demo: Stunning data visualization in the AlloSphere
- US Department of Veterans Affairs: Cyberseminar of HSR&D (Health Services Research & Development)
- Science Webinar (2019): Selling without selling out: How to communicate your science
- ESRI: Spatial Statistics Presentations
- SAS: A hands-on introduction to SAS data step hash programming techniques
- Math
- Why is the limit (1−1/n)^n equal to 1/e?
- Limit of (1+x/n)^n when n tends to infinity
- L’Hopital’s Rule is a powerful technique for finding the limit of an indeterminate form 0/0 or ∞/∞. What we need to do is differentiate the numerator and denominator and then take the limit
- Alder (2001): An introduction to mathematical modelling