Epidemiology and Beyond: January 2009

Friday, January 16, 2009

Comparisons of percentage body fat, body mass inde... Flegal 2009

BACKGROUND: Body mass index (BMI), waist circumference (WC), and the waist-stature ratio (WSR) are considered to be possible proxies for adiposity. OBJECTIVE: The objective was to investigate the relations between BMI, WC, WSR, and percentage body fat (measured by dual-energy X-ray absorptiometry) in adults in a large nationally representative US population sample from the National Health and Nutrition Examination Survey (NHANES). DESIGN: BMI, WC, and WSR were compared with percentage body fat in a sample of 12,901 adults. RESULTS: WC, WSR, and BMI were significantly more correlated with each other than with percentage body fat (P < 0.0001 for all sex-age groups). Percentage body fat tended to be significantly more correlated with WC than with BMI in men but significantly more correlated with BMI than with WC in women (P < 0.0001 except in the oldest age group). WSR tended to be slightly more correlated with percentage body fat than was WC. Percentile values of BMI, WC, and WSR are shown that correspond to percentiles of percentage body fat increments of 5 percentage points. More than 90% of the sample could be categorized to within one category of percentage body fat by each measure. CONCLUSIONS: BMI, WC, and WSR perform similarly as indicators of body fatness and are more closely related to each other than with percentage body fat. These variables may be an inaccurate measure of percentage body fat for an individual, but they correspond fairly well overall with percentage body fat within sex-age groups and distinguish categories of percentage body fat.

Thursday, January 15, 2009

How should I calculate a within-subject coefficient of variation?
by Martin Bland
In the study of measurement error, we sometimes find that the within-subject variation is not uniform but is proportional to the magnitude of the measurement. It is natural to estimate it in terms of the ratio within-subject standard deviation/mean, which we call the within-subject coefficient of variation.
In our British Medical Journal Statistics Note on the subject, Measurement error proportional to the mean, Doug Altman and I described how to calculate this using a logarithmic method. We take logarithms of the data and then find the within-subject standard deviation. We take the antilog of this and subtract one to get the coefficient of variation.
Alvine Bissery, statistician at the Centre d'Investigations Cliniques, Hôpital européen Georges Pompidou, Paris, pointed out that some authors suggest a more direct approach. We find the coefficient of variation for each subject separately, square these, find their mean, and take the square root of this mean. We can call this the root mean square approach. She asked what difference there is between these two methods.
In practice, there is very little difference between these two ways of estimating within-subject coefficient of variation. They give very similar estimates.
This simulation, done in Stata, shows what happens. (The function invnorm(uniform()) gives a standard Normal random variable.)
. clear
Set sample size to 100.
. set obs 100
obs was 0, now 100
We generate true values for the variable whose measurement we are simulating.
. gen t=6+invnorm(uniform())
We generate measurements x and y, with error proportional to the true value.
. gen x = t + invnorm(uniform())*t/20
. gen y = t + invnorm(uniform())*t/20
Calculate the within-subject variance for the natural scale values. (Within-subject variance is given by difference squared over 2 when we have pairs of subjects.)
. gen s2 = (x-y)^2/2
Calculate subject mean and s squared / mean squared, i.e. CV squared.
. gen m=(x+y)/2
. gen s2m2=s2/m^2
Calculate mean of s squared / mean squared.
. sum s2m2
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
s2m2 100 .0021519 .0030943 4.47e-07 .0166771
The within-subject CV is the square root of the mean of s squared / mean squared:
. disp sqrt(.0021519)
.04638858
Hence the within-subject CV is estimated to be 0.046 or 4.6%.
Now the log method. First we log transform.
. gen lx=log(x)
. gen ly=log(y)
Calculate the within-subject variance for the log values.
. gen s2l = (lx-ly)^2/2
. sum s2l
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
s2l 100 .0021566 .003106 4.46e-07 .0167704
The within-subject standard deviation on the log scale is the square root of the mean within-subject variance. The CV is the antilog (exponent since we are using natural logarithms) minus one.
. disp exp(sqrt(.0021566))-1
.04753439
Hence the within-subject CV is estimated to be 0.048 or 4.8%. Compare this with the direct estimate, which was 4.6%. The two estimates are almost the same.
If we average the CV estimated for each subject, rather than their squares, we do not get the same answer.
Calculate subject CV and find the mean.
. gen cv=sqrt(s2)/m
. sum cv
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
cv 100 .0361173 .0292567 .0006682 .1291399
This gives us the within-subject CV estimate = 0.036 or 3.6%. This is considerably smaller than the estimates by the root mean square method or the log method. The mean CV is not such a good estimate and we should avoid it.
Sometimes researchers estimate the within-subject CV using the mean and within-subject standard deviation for the whole data set. They estimate the within-subject standard deviation in the usual way, as if it were a constant. They then divide this by the mean of all the observations to give a CV. This appears to be a completely wrong approach, as it estimates a single value for a varying quantity. However, it often works remarkably well, though why it does I do not know. It works in this simulation:
. sum x y s2
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
x 100 6.097301 1.012154 3.62283 8.696612
y 100 6.081827 1.000043 3.759932 8.447584
s2 100 .0823188 .1212132 .0000193 .605556
The within-subject standard deviation is the square root of the mean of s2 and the overall mean is the average of the X mean and the Y mean. Hence the estimate of the within-subject CV is:
. disp sqrt(.0823188)/( (6.097301 + 6.081827)/2)
.04711545
So this method gives the estimated within-subject CV as 0.047 or 4.7%. This can be compared to the estimates by the root mean squared CV and the log methods, which were 4.6% and 4.8%. Why this should be I do not know, but it works. I do not know whether it would work in all cases, so I do not recommend it.
We can find confidence intervals quite easily for estimates by either the root mean square method or the log method. For the root mean square method, this is very direct. We have the mean of the squared CV, so we use the usual confidence interval for a mean on this, then take the square root.
. sum s2m2
Variable Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
s2m2 100 .0021519 .0030943 4.47e-07 .0166771
The standard error is the standard deviation of the CVs divided by the square root of the sample size.
. disp .0030943/sqrt(100)
.00030943
The 95% confidence interval for the squared CV can be found by the mean minus or plus 1.96 standard errors. If the sample is small we should use the t distribution here. However, the squared CVs are unlikely to be Normal, so the CI will still be very approximate.
. disp .0021519 - 1.96*.00030943
.00154542
. disp .0021519 + 1.96*.00030943
.00275838
The square roots of these limits give the 95% confidence interval for the CV.
disp sqrt(.00154542)
.03931183
. disp sqrt(.00275838)
.05252028
Hence the 95% confidence interval for the within-subject CV by the root mean square method is 0.039 to 0.053, or 3.9% to 5.3%.
For the log method, we can find a confidence interval for the within-subject standard deviation on the log scale. The standard error is s_w/root(2n(m-1)), where s_w is the within-subject standard deviation, n is the number of subjects, and m is the number of observations per subject.
In the simulation, s_w = root(0.0021566) = 0.0464392, n = 100, and m = 2.
Hence the standard error is 0.0464392/root(2 * 100 * (2-1)) = 0.0032837.
The 95% confidence interval is 0.0464392 - 1.96*0.0032837 = 0.0400031 to 0.0464392 + 1.96*0.0032837 = 0.0528753.
Finally, we antilog these limits and subtract one to give confidence limits for the CV: exp(0.0400031)-1 = 0.040814 and exp(0.0528753)-1 = 0.05429817, so the 95% confidence interval for the within-subject CV is 0.041 to 0.053, or 4.1% to 5.3%. These are slightly narrower than the root mean square confidence limits, but very similar.
I would conclude that either the root mean square method or the log method can be used.

A List of Statistics Notes of the British Medical Journal

A List of Statistics Notes of the British Medical Journal

A list of notes on BMJ website
Doug Altman nicely lists all the notes on one webpage. He died on June 3rd 2018 Sadly.

What Is Gestational Diabetes?

see Diabetes Care

2009 Diabetes Clinical Practice Recommendations

http://care.diabetesjournals.org/content/vol32/Supplement_1/

Monday, January 12, 2009

My Genome, my self

My Genome, My Self - Steven Pinker

http://www.iht.com/articles/2009/01/11/news/11genomet.php?page=1

My Genome, Myself Seeking Clues in DNA - Amy Harmon

Friday, January 09, 2009

In Search of a Good Doctor - NYTimes.com

Web Sites

The following is a compilation of recommendations from the physicians mentioned in this column. It is by no means exhaustive but should provide a starting point for those interested in researching their doctors or conditions.

Decision Making:

1. Foundation for Informed Medical Decision Making: This site (www.informedmedicaldecisions.org), endorsed by the Society of General Internal Medicine, offers information and tips on how patients can become more actively involved in the medical decision-making process and get the care that is right for them.

Researching on the Web:

1. Medical Library Association: The Medical Library Association has compiled a guide (www.mlanet.org/resources/userguide.html) to help individuals sort through the myriad offerings on the Web. Included is an M.L.A. “Top 10” most useful consumer health Web sites.

Researching Physicians and Hospitals:

1. State boards of medicine: State medical boards can provide information regarding a doctor’s licensure, training and history of disciplinary action.

• The New York State Board for Medicine (www.op.nysed.gov/) and, in New York City, the New York County Medical Society (www.nycms.org).

• To locate other state boards, the American Medical Association provides a listing. (www.ama-assn.org/ama/pub/category/2645.html)

2. The American Board of Medical Specialties (www.abms.org/): The A.B.M.S. has a doctor-finder function that will pull up a physician’s board certification.

3. The Agency for Healthcare Research and Quality (www.talkingquality.gov/compendium/index.html) : The A.H.R.Q. has compiled health care “report cards” that provide comparative information on the quality of health plans, hospitals, medical groups, individual physicians, nursing homes and other providers of care.

4. The United States Department of Health and Human Services, Hospital Compare (www.hospitalcompare.hhs.gov): This site provides information from participating hospitals on how well those hospitals care for patients with certain medical conditions or surgical procedures. Also included are the results from patient surveys on quality of care during hospital stays.

5. State of California Report Card (www.opa.ca.gov/report_card): This site, from the California Office of the Patient Advocate, provides report cards of various health insuranceplans, medical groups and doctors in California. Some other states have similar sites.

Researching a Disease, Condition or Procedure:

1. National professional medical societies: Below is a sampling of some of the larger national societies. Many societies have sites specifically designed for patients.

• American Academy of Pediatrics (www.aap.org)

• American Society of Anesthesiologists (www.asahq.org)

• American College of Physicians (www.acponline.org)

• Society of General Internal Medicine (www.sgim.org)

• American Society of Clinical Oncology (www.asco.org)

• American Academy of Family Physicians (www.aafp.org)

• American College of Surgeons (www.facs.org)

• American College of Obstetricians and Gynecologists (www.acog.org)

2. Medline Plus: Supported by both the National Library of Medicine and the National Institutes of Health, Medline Plus (www.medlineplus.gov) offers patients an array of information on health topics, drugs and current news. In addition, there are interactive tutorials, surgery videos, health information for older adults and links to clinical trials and health information for older adults.

3. Centers for Disease Control and Prevention: The C.D.C. (www.cdc.gov) is an excellent patient resource for communicable diseases and preventive and public health.

4. The American College of Physicians Foundation: In conjunction with the American College of Physicians, the ACP Foundation (foundation.acponline.org) offers succinct and clear health information for patients.

5. For cancer patients: There are three Web sites that can serve as excellent starting points:

• The National Cancer Institute (www.cancer.gov)

•The American Cancer Society (www.cancer.org)

• The American Society of Clinical Oncology (www.cancer.net)

6. My HealtheVet: Veterans, their advocates and employees of the Veterans Health Administration have access to extensive quality information and patient education resources through this site (www.myhealth.va.gov), from the Department of Veterans Affairs.

full text here

Thursday, January 08, 2009

AHA/CDC statement: Markers of Inflammation and Cardiovascular Disease

http://circ.ahajournals.org/cgi/content/full/107/3/499#SEC6

Markers of Inflammation and Cardiovascular Disease Application to Clinical and Public Health Practice: A Statement for Healthcare Professionals From the Centers for Disease Control and Prevention and the American Heart Association

In 1998, the American Heart Association convened Prevention Conference V to examine strategies for the identification of high-risk patients who need primary prevention. Among the strategies discussed was the measurement of markers of inflammation.1 The Conference concluded that "many of these markers (including inflammatory markers) are not yet considered applicable for routine risk assessment because of: (1) lack of measurement standardization, (2) lack of consistency in epidemiological findings from prospective studies with endpoints, and (3) lack of evidence that the novel marker adds to risk prediction over and above that already achievable through the use of established risk factors." The National Cholesterol Education Program Adult Treatment Panel III Guidelines identified these markers as emerging risk factors,1a which could be used as an optional risk factor measurement to adjust estimates of absolute risk obtained using standard risk factors. Since these publications, a large number of peer-reviewed scientific reports have been published relating inflammatory markers to cardiovascular disease (CVD). Several commercial assays for inflammatory markers have become available. As a consequence of the expanding research base and availability of assays, the number of inflammatory marker tests ordered by clinicians for CVD risk prediction has grown rapidly. Despite this, there has been no consensus from professional societies or governmental agencies as to how these assays of markers of inflammation should be used in clinical practice.

How large of SE/Variance is too large?

How large of SE/Variance is too large?

A relative standard error (RSE) greater than 30% was used to identify unreliable estimates. The RSE is defined as the ratio of the standard error of the estimate divided by the estimate multiplied by 100 [RSE = 100 x SE(b) / |b|, which is similar to Coefficient of Variation (CV) = SD(b) / |b|, ].

Klein (2002): Healthy People 2010 Criteria for Data Suppression (pdf)
Parker(2017): National Center for Health Statistics data presentation standards for proportions (on page 3: relative CI width calculation and 130% cut-point)

Example of a MMWR article about Surveillance for Fatal and Nonfatal Injuries
National counts or estimates determined to be unstable are indicated with a footnote in the tables. Fatal injuries were identified as unstable if the number of deaths was <20 or the coefficient of variation (CV) was >30%, where CV = (SE / number of deaths) × 100. Nonfatal injuries were identified as unstable if the national estimate was <1,200, the number of sample cases used was <20, or CV was >30%, where CV = (SE / national estimate) × 100.

Why are rates based on fewer than 20 cases marked as being unreliable?
by NY State Department of Health: Data Sources and Tools - Chronic Diseases and Conditions

Example of a NHIS article (Variance Estimation and Significance Testing)
"... Standard errors are shown for all percentages in the tables (but not for the frequencies). Estimates with relative standard errors (RSE) of greater than 30% and less than or equal to 50% are considered statistically unreliable and are indicated with an asterisk (*). Estimates with a relative standard error greater than 50% are indicated with a dagger (†) and the estimates are not shown..."

More about reliability

Singh (2004) "A generalization of the Coefficient of variation with application to suppression of imprecise estimates "
The National Electronic Injury Surveillance System "A tool for research"
Precision of measurement by A New View of Statistics
CV by Wikipedia
Assessing Product Reliability by Engineering Statistics Handbook
Reliability by the Research Methods Knowledge Base

Wednesday, January 07, 2009

Comments on : Women may get diabetes earlier than men

Comments on : Women may get diabetes earlier than men

O-o-m-m. This conclusion must base on some assumptions, such as the definition of pre-diabetes is correct, pre-diabetes must move into diabetes, the normal level of these index are exactly same between women and men (no gender difference?), incubation from pre-diabetes to clinic diabetes are exactly same between women and men, etc. In light of my poor memory, there are very solid evidences that women are living longer than men generally, women have less heart disease risk than men. I don't have any evidences to doubt their findings (actually these are very interesting findings), but I doubt the simple conclusion.

Just some thoughts.

_____________________________________________
Subject: Chicago Health: Women may get diabetes earlier than men

January 7, 2009
Chicago Health
Women May Get Diabetes Earlier Than Men
http://www.nbcchicago.com/health/topics/Women_May_Get_Diabetes_Earlier_Than_Men.html
Women may show signs of diabetes far earlier than men, according to new research. The findings could lead to new diabetes screening procedures to help identify who is at greatest risk of developing the disease.

Researchers from the University of Buffalo studied newly identified risk factors for type 2 diabetes, a disease of metabolism in which the body produces insufficient amounts or fails to use insulin, the hormone needed for cells to process glucose. This leads to a buildup of glucose in the bloodstream. Type 2 diabetes increases the risk of developing heart disease, stroke, eye and kidney diseases and other chronic illnesses.

According to the National Institutes of Health, an estimated 20.8 million Americans—7 percent of the population—had diabetes in 2005, 6.2 million of them undiagnosed. Of the diabetic population, an estimated 90 to 95 percent had the most common form of the disease, type 2 diabetes. In addition, government estimates indicate that at least 43 million Americans have prediabetes, a condition that occurs when blood glucose levels are high but not high enough to be classified as diabetes.

Recent research has shown that levels of chronic sub-acute inflammation, blood clotting factors and dysfunction in the cells lining the inside of arteries may be indicators of diabetes risk factors when tested in the blood.

The Buffalo researchers looked at 1,455 healthy men and women who participated in the Western New York Study between 1996 and 2001. That study tracked alcohol consumption and risk factors for cardiovascular disease. Participants were disease free with no indications of diabetes. They were tested and given physical examinations at the outset and during a six-year follow-up period. The researchers re-examined the participants between 2002 and 2004 and compared the new blood tests to results from 1996-2001.

The blood tests included fasting glucose and insulin levels, C-reactive protein, proinflammatory markers and markers for dysfunction of endothelial tissue lining blood vessels. C-reactive protein (CRP) is a substance produced by the liver that increases whenever there is inflammation in the body. CRP levels rise whenever there is an immune system response or activation. Women in the study had a higher incidence of prediabetes than men. Researchers could not explain why the differences occurred and said more studies are needed.

"Because these pre-diabetic markers are not routinely assessed and because diabetes is strongly linked with coronary heart disease, the study may help explain why the decline in death rates for heart disease in diabetic women lags behind that of diabetic men," lead author Dr. Richard Donahue said in a press release.

Donahue added: "Previous research had shown that hypertension and cholesterol were elevated among women who later developed diabetes. However, current findings that these novel risk factors [markers of endothelial dysfunction, chronic sub-acute inflammation and blood clotting factors] are elevated among women even earlier than previously recognized does suggest that the 'diabetic clock' starts ticking sooner for women than for men."

He suggested that women whose blood glucose levels increase over time should perhaps be screened more intensively for cardiovascular disease.

Saturday, January 03, 2009

Epidemiology & Biostatistics

eBooks
CDC: Public Health 101 Series
CDC: Crisis & Emergency Risk Communication (CERC)
UCLA Statistic Computing is a valuable resource for learning statistics and statistic software such as SAS, Stata, and R!.
Statistic Solutions has some succinct explanation of statistical approaches included Factor Analysis & SEM
Zuur (2009). A Protocol for data exploration to avoid common statistical problems
Age-Standardization and Age-Adjustment

Institute of Medicine. The future of public health (1988), The future of public's health in the 21st century (2003)
Smelser (2001): International Encyclopedia of the Social & Behavioral Sciences [1st (2001)] [2nd (2015)]

Lohr (2001): Sample Surveys: Model-based Approaches

Dr. David Kleinbaum published his ActivEpi in 2001, which is now online for free (ActivEpi website)
Majid Ezzati (2006):Global Burden of disease and Risk Factors

What is epidemiology? and Jokes.
Interpretation of relative risk
Poisson regression and count outcome

How large of SE is too large?
Walker (2016): A Guide to Section 508 Compliance Using SAS® 9.4 ODS
Gordon (2014): An exercise in non-linear modeling

Complex Sampling Survey

Lohr (SASGF 2012). Using SAS® for the Design, Analysis, and Visualization of Complex Surveys

Allen Downey (2014): Think Stats using Python
Grinstead: Introduction to Probability
Michael Lavine (2013): Introduction to Statistical Thought
GRADE website: GRADE guidelines (Grades of Recommendation, Assessment, Development, and Evaluation) (2011)
CONSORT website- Transparent Reporting of Trials: Guidelines for Reporting Observational & RCT Studies and Flow Diagram (2010)
STROBE website: STROBE: The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (2007)
Clinical Analyte Unit Conversion - Jay Clinic Service.
OpenEpi provides statistics for counts and measurements in descriptive and analytic studies, stratified analysis with exact confidence limits, matched pair and person-time analysis, sample size and power calculations, random numbers, sensitivity, specificity and other evaluation statistics, R x C tables, chi-square for dose-response, and links to other useful sites.
Statistical literacy
Chart Chooser — the favorite tool for improved Excel and PowerPoint charts. there is R! version of Chart Chooser (not many charts on the site, but the idea is great)
Jon's Excel Charts and Tutorials - Peltier Tech
Stats + Stories: The statistics behind the stories and the stories behind the statistics.
Ann Emery: Data visualization blogs
Broman (2017): Data organization in spreadsheets
Vincent Granville (2014): 10 types of regressions. Which one to use?
Moderator vs mediator

Wikipedia: Mediation, Moderation
David Kenny: Moderator, Mediator
Baron & Kenny (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182.
inference (2018).ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus. Interview of Judea Pearl.To Build Truly Intelligent Machines, Teach Them Cause and Effect

Distribution (Probability, CDF, and Quantile)

Wicklin (2018). Fit a distribution from quantiles (SAS)

Tony Hey (2009): The Fourth Paradigm Data-Intensive Scientific Discovery (Microsoft/Publications)
Yanir Seroussi: Causal Inference reading List
Missing Imputation

Blog: Multiple Imputation
Allison (2014). Sensitivity analysis for not missing at random.
Stata: Yulia Marchenko (2011). Chained equations and more in multipleimputation in Stata 12
Survey data imputation

Wells (2018): Approaches to imputing missing data in complex survey data
Mukhopadhyay (2016): Survey Data Imputation with PROC SURVEYIMPUTE (Video)

Resampling and Monte Carlo Simulation
Latent Class Analysis

Christopher Baum (2016): Introduction to SEM in Stata
Jones (2012): A Stata plugin for estimating group-based trajectory models (traj)
Curran-Bauer(2016): Introduction to Growth Curve Modeling: An Overview and Recommendations for Practice
Nagin (1999): Analyzing developmental trajectories: a semiparametric, group-based approach

Training Course

NIH Training: Behavioral and social sciences research course

Linear and nonlinear function/relationships/regression

Khan Academy: Linear and nonlinear functions (1, 2), Exploring nonlinear relationships
Richard Williams: Nonlinear relationships, Stata highlights
Minitab: What Is the Difference between Linear and Nonlinear Equations in Regression Analysis?; Linear or Nonlinear Regression? That Is the Question; Curve Fitting with Linear and Nonlinear Regression
UCLA: Nonlinear Regression in SAS; Nonlinear or Linear Model
PennState: Logistic, Poisson, and Nonlinear Regression
datascience+: First steps with Non-Linear Regression in R!
StackExchange: How to tell the difference between linear and non-linear regression models?
StatisticsSolutions: Nonlinear regression
Wikipedia: Linear function; Nonlinear system; Linear regression; nonlinear regression
Ruckstuhl: Introduction to Nonlinear Regression
Motulsky (2016): Fitting curves to data using nonlinear regression
Haan: What are nonlinear regression functions?
Brannick: Curvilinear Regression
Wicklin (2018). Solve a system of nonlinear equations with SAS
Wicklin (2018). Fit a growth curve in SAS

Trend analysis

Blog: Trend Analysis
Blog: Slope Index of Inequality (SII) and Relative Index of Inequality (RII)
Blog: How to do a trend analysis in Stata
Blog: How to get orthogonal polynomial coefficient/vector/codes
NIH.Joinpoint (software)
NHS.Public Health & Intelligence Trend Analysis Guidance (pdf)

Bayesian

Blog: P value, effect size, and Bayes factor, and after one year
Kruschke (2013).Bayesian estimation supersedes the t test
bayestestR: Become a Bayesian master you will
McElreath: Statistical Rethinking: Bayesian statistics using R & Stan open access online.
Scott Cunningham: Causal Inference: the Mixtape (using Stata)

Stata

Tim Essam Cheatsheet: User's Corner: Quick references for your favorite commands

Blog: Stata - my first Stata program
Stata Online Help and Document
The Stata Journal
Tips of Stata
Dickman. Estimating and modelling relative survival using Stata (strs, stnet)
Drukker. Programming an estimation command in Stata: a map to posted entries
Sribney. How can I estimate correlations and their level of significance with survey data
margins - undocumented and under documented

margins, gen() creates variables with predictions for each observation
margins, at(varnm=gen(exp)) generates values for making predictions

margins dis, at(age=gen(age)) gives average prediction by dis at observed age, which is equal to margins dis
margins, at(age=gen(age+1)) gives average prediction by dis at observed age plus 1, which is equal to: .replace age=age+1 .margins dis
margins dis, at(age=gen(age) age=gen(age+1))
Average prediction at observed plus standard deviation: .sum age .local sd=r(sd) .margins dis, at(age=gen(bmi+`sd'))

Princeton University. Online Stata Tutorial at DSS
Tiberlake

Bayesian analysis in Stata 15
Stata Tips #7 - dyntext, dyndoc and user-written commands (version 15)
Stata Tips #8 spatial analysis in Stata 15 (version 15)

Herrera (2017). Spatial econometrics methods using Stata
Kondo (2015) Hot and cold spot analysis using Stata (The Stata Journal)
Pisati (2010). Exploratory spatial data analysis using Stata

How to estimate intraclass correlation with survey data (VIF)? (Link1, Link2)

use "correlate" with aweight (it is equivalent to pweight) for point estimates of the correlation coefficient.
use "svy: regress" for p-values. Do "svy: regress y x" and "svy:regress x y" and take the biggest p-value, which is the conservative thing to do.
You might try the "corr_svy" statement which a module to compute correlation tables for survey data. It's based on the Sribney's procedures mentioned above.
Or, you can get the correlation coefficient using "svy: regress y x", then "disp sqrt(e(r2))" to show coefficient (here e(r2) has squared R value. You can also calculated tolerance using "disp 1-e(r2)" and VIF (variance inflation factor) using "disp 1/(1-e(r2))" and , the A rule of thumb is that if VIF>10 then you need examine multicollinearity further.
Alternative for VIF calculation: "regress y x z [pw=srvyweight]", then "estat vif"

Cross Validation

Trevor Hastie's The Elementary of Statistical Learning is a good and free book for more information. Thanks to Hastie.
k-fold cross validation
Stata: user written program crossfold (help file).
R: Petr Keil (2013): AIC & BIC vs. Crossvalidation using R!.
SAS: Using Validation and Cross Validation using PROC GLMSELECT.
Deming: Cross Validation Using SAS

Net reclassification improvement (NRI) Pencina (2011): Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers (an example of application of cross-validation).

Data Visualization

Nathan Yau. The changing American diet

Survival Analysis

Stephen Jenkins: Survival Analysis with Stata (U of Essex)
Princeton German Rodriguez: Survival Analysis Pop 509 course notes
Roberto Gutierrez: On Frailty Models in Stata (used the same dataset (bc.dta) by Jenkins' course)
Austin (2017): A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications
How to split single observation into multiple observations by event time (Lexis Diagram): Stata - stsplit, R! - survival::survSplit and Epi::Lexis, SAS - Lexis.sas (pdf)

Multilevel and Small Area Estimation (SAE) Analysis

Princeton German Rodriguez: Multilevel Models Pop 510 course notes
NIH (2000). Progress and promise in research on social and cultural dimensions of health - A research agenda (Video)
University of Bristol. Centre for Multilevel Modelling
Joop Hox (The author of Multilevel Analysis)'s Homepage has papers, programs and lectures to download.

Rabe-Hesketh (2006). Multilevel modelling of complex survey data (Slides 2007)

Multilevel models for complex survey data - The slides/articles of tutorial at 2011 BRFSS conference
Paul Allison (2017).Using "Between-Within" models to estimate contextual effects

Suchindran.Sampling weights and Regression Analysis

Zaccarin (2008). The effects of sampling weights in multilevel analysis of PISA data. (Slides - Kiel 2009)

Carle (2009). Fitting multilevel models in complex survey data with design weights: Recommendations

D’Agostino (SASGF 2013). Multilevel Reweighted Regression Models to Estimate County-Level Racial Health Disparities Using PROC GLIMMIX

Chantala. Software to Compute Sampling Weights for Multilevel Analysis

UCLA papers on multilevel modeling
Effects of Multicollinearity on Completed Model
Suzuki (2012). Clarifying the use of aggregated exposures in multilevel models. In this article discussed the multicollinearity issue between an individual variable and the aggregated variable.
Feaster (2011). Multilevel models to identify contextual effects on individual group member outcomes
Andr ́es Guti ́errez. Small Area Estimation 101
Sun (2015) Analysis of spatial and temporal data
Sptatiotemporal analysis
Skrondal (2009).Prediction in multilevel generalized linear models

PROC GLIMMIX (pdf v13.1) (SAS Documentation)

Stata

Multilevel mixed-effects models

me: multilevel models to fit random-intercept and random-slope models.

xt: Random-effects panael-data estimators

Multilevel linear models in Stata, part 1: Components of variance (xt, Video)

Multilevel linear models in Stata, part 2: Longitudinal data (xt, Video)

Multilevel generalized linear models (me, Video)

User-written program: gllamm
Huber (2014). How to simulate multilevel/longitudinal data
Rabe-Hesketh (2008). Prediciton in multilevel logistic regression

Blogs etc.

Statistic notes in the BMJ

Statistical Horizons Blog (Paul Allison)

Rick Wicklin Blog (SAS)

R Bloggers (R!)

The Stata Blog (Stata)

Bendix Carstensen (R! and Age-Period-Cohort analysis)

medRxiv.org: the preprint server for health sciences

arXiv.org: Open access to e-prints in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics

Allison (2018). For causal analysis of competing risks, don't use Fine & Gray's subdistribution method
Kevin Markham (2015). Should you teach Python or R for data science?
Allison (2014). Prediction vs. Causation in Regression Analysis
Norm Matloff (2014). Why are we still teaching T-tests?
Rick Wicklin (2014). How to choose colors for maps and heat maps.
Nathan Yau. How to Visualize and Compare Distribution
Daniel Lakens: The 20% Statistician
Estimating the covariance of the means from two samples?
Jonas Kristoffer Lindelov: Coommon statistical tests are linear models (Chinese version)

Rodriguez. Generalized linear models - 7.4 the piecewise exponential model

Statistical Reflections of a Medical Doctor (2012). Survival Analysis via Hazard Based Modeling and Generalized Linear Models

How to do bootstrapping/Jackknife using Stata?

Journal: Significance communicates and demonstrates statistic practice in an entertaining, thought-provoking and non-technical way.

Journal: Statistic Science has some good philosophic articles and interviews with experts.

Journal: Survey Statistician

Public-accessible Health-related Datasets

Google Public Data Explorer The Google Public Data Explorer makes large datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don't have to be a data expert to navigate between different views, make your own comparisons, and share your findings.

How to Estimate Percentiles and Confidence Intervals

Timeline of Statistics (pdf)

Distinguished Lectures of The Joint Program in Survey Methodology, University of Maryland
Sorensen:The Use and Misuse of the Coefficient of Variation in Organizational Demography Research
Predict a value and estimate the variance of a single response instead of average response, individual vs. marginal, standard error of prediction (Engineering Statistics Handbook, SAS JMP, StackExchange, R-bloggers, Martha Smith) marginal distribution (Statistics How To, Jason Brownlee, StackExchange, Khan Academy)
UCLA: Delta Method in R!
Video Clip/Webinar
- Hans Rosling shows the best stats you've ever seen
- David McCandless: The beauty of data visualization
- Demo: Stunning data visualization in the AlloSphere
- US Department of Veterans Affairs: Cyberseminar of HSR&D (Health Services Research & Development)
- Science Webinar (2019): Selling without selling out: How to communicate your science
- ESRI: Spatial Statistics Presentations
- SAS: A hands-on introduction to SAS data step hash programming techniques
Math

Why is the limit (1−1/n)^n equal to 1/e?
Limit of (1+x/n)^n when n tends to infinity
L’Hopital’s Rule is a powerful technique for finding the limit of an indeterminate form 0/0 or ∞/∞. What we need to do is differentiate the numerator and denominator and then take the limit
Alder (2001): An introduction to mathematical modelling

Epidemiology and Beyond