Epidemiology and Beyond

Thursday, September 01, 2016

Stata: display system date

Stata: display system date

.di "system date:" c(current_date)
.di "system date:" "$S_DATE"
.di %td_CY-N-D date("`c(current_date)'","DMY") // "` '" are not necessary
.di %td_CY-N-D date("$S_DATE","DMY")
.di "system year: " year(date(c(current_date),"DMY") // w/o `' around c(current_date)
.di "system month: " month(date(c(current_date),"DMY"))
.di "system day:" day(date(c(current_date),"DMY"))
.di "system year: " year(date("$S_DATE","DMY"))
.di "system month: " month(date("$S_DATE","DMY"))
.di "system day:" day(date("$S_DATE","DMY"))
more examples:

Works ('local' with '=')

local dd=day(date(c(current_date),"DMY"))
local mm=month(date(c(current_date),"DMY"))
local yy=year(date(c(current_date),"DMY"))
log using "output_`yy'_`mm'_`dd'.log", replace
log close

Doesn't work ('local' without '=')

local dd day(date(c(current_date),"DMY"))
local mm=month(date(c(current_date),"DMY"))
local yy=year(date(c(current_date),"DMY"))
log using "output_`yy'_`mm'_`dd'.log", replace
log close // invalid 'DMY' r(198)

Works ('global' with '=')

global dd=day(date(c(current_date),"DMY"))
global mm=month(date(c(current_date),"DMY"))
global yy=year(date(c(current_date),"DMY"))
log using "output $yy-$mm-$dd.log", replace
log close

Sunday, July 31, 2016

Recycle/reuse returned results in Stata

Recycle/reuse returned results in Stata

UCLA: "How can I access information stored after I run a command in Stata (returned results)?"
The Stata Blog: Drukker (2015). Programming an estimation command in Stata: Where to store your stuff
Stackoverflow (2014).Saving coefficients and standard errors as variables
Lembcke (2009). Advanced Stata Topics
SSCC. An Introduction to Mata
Stata commands are grouped into 4 major categories: r-class, e-class, s-class, and n-class commands. Also a c-class contains the values of system parameters and settings, along with certain constants.
The commands produce the statistical results are either r-class or e-class. e-class commands produce the estimation results, others are belong to r-class.
After submitting "contrast", Stata generates a L matrix (r(L)), you can check the contrast coefficients using "matrix list r(L)".
If don't know what results are outputted, use "return list" or "ereturn list" to find them. The scalar results from a r-class can be used with the "r(...)" and scalar results from e-class command can be used with "e(...)". Here, "..." is the name showed using "return list" or "ereturn list". The use of results in matrix form is a little tricky. "_b[...]" or "_se[...]" have to be used; here, "..." is the variable name of a coefficient in the model. The results for a constant is used as "_b[_cons]" for beta coefficient or "_se[_cons]" for standard error. A matric results can also converted into a matrix: "mat B=e(b)", then "disp B[rowno, colon]".
To show variance-covariance matrix, use: "estat vce" or just simple "matlist e(V)", and to show correlation, use: "estat vce, correlation".
You can "estimate store" and "estimate restore" a set of estimates with a name in memory, in such way, the following command will not be erased. If want to save and use it as a permanent file, you can use "estimate save" and "estimate use".
A single number can been converted into scalar, for example, "scalar xyz=_b[agecat]". However, the scalar has to be used with a pseudofunction scalar(), for example, "display scalar(xyz)" (more info)
The e(V) and e(b) matrices can be converted into variables of a dataset using "svmat" (convert variables into matrix using "mkmat"), which is similar to "putmat and getmat" of mata (matrix ref.):

mat D = e(b)', e(b)'
svmat double D, name(coef)
mat se1=vecdiag(e(V))
mat se2=vecdiag(e(V))
mat SE = se1, se2
svmat SE, name(se)

The "ereturn display" can use the e(V) and e(b) matrices to return a r-class matrix "r(table)"

"margins" also gives e-class results:

webuse dollhill3,clear
poisson deaths i.smokes##c.agecat, exposure(pyears)
est store tempreg
margins smokes, gen(dhat) predict(ir) // undocumented gen()
mean dhat1 // for smokes = 0
scalar dhat1=_b[dhat1] // .00810452
margins smokes, eydx(agecat) predict(ir) post
scalar eydxsmokes0=_b[0.smokes] // 1.046826
est restore tempreg
margins smokes, dydx(agecat) predict(ir) post
scalar dydxsmokes0=_b[0.smokes] // .00848402
disp scalar(dydxsmokes0)/scalar(dhat1) // gives 1.046826

Gould(2010).Mata Matters; (2011).Mata, the missing manual. Baum(2009).Using Mata to work more effectively in Stata
putmat and getmat - Put Stata variables into Mata and vice versa

mata r2=(1\2\3)
mata b=st_matrix("e(b)")'
mata se=sqrt(diagonal(st_matrix("e(V)")))
getmata r2 b se, force
vwls b r2, sd(se)
reg b r2

Rename "rowname" and "colname" of a matrix

     program estmatrename, eclass
       matrix BB = e(b)
     matrix colnames BB = "1.race" "2.race" "3.race"
       ereturn repost b = BB, rename
      matrix VV = e(V)
      matrix colnames VV = "1.race" "2.race" "3.race"
      matrix rownames VV = "1.race" "2.race" "3.race"
      ereturn repost V = VV
     end

total heartatk [pw=swgt], over(race)
estmatrename
lincom (_b[3.race]-_b[1.race])/2
test _b[1.race]=_b[2.race]
contrast {race 1 -1 0}
contrast p(1).race

Convert ln(RR) into RR and percent change

webuse dollhill3
poisson deaths smokes i.agecat,exposure(pyears) irr margins agecat, predict(ir) post
qui nlcom (lnRR21:ln((_b[2.agecat]/_b[1.agecat])))(lnRR31:ln((_b[3.agecat]/_b[1.agecat]))) (lnRR41:ln((_b[4.agecat]/_b[1.agecat]))), post
ereturn disp,eform(RR) cformat(%5.2f) pformat(%5.4f)
mat rtable=r(table)'
mat RR=rtable[1...,"b"],rtable[1...,"ll".."ul"]
mata st_matrix("pctable",(st_matrix("RR"):-1):*100)
mat coln pctable=RR LL UL
matlist pctable, format(%10.2f)r

Monday, May 23, 2016

The 21 greatest graduation speeches of the last 60 years

Vox: The 21 greatest graduation speeches of the last 60 years
by German Lopez on May 11, 2016
"Graduation speeches are the last opportunity for a high school or college to educate its students. It's unsurprising, then, that these institutions often pull in some of the world's most powerful people to leave an equally powerful impression on their students. Here are the best of those speeches and some of the sections that resonate the most..." (May 11, 2016)
To read and watch the full article on the Vox website here.

Sunday, May 01, 2016

R! Books

R! Books

Oscar: Big Book of R collection
RStudio: Cheatsheets
W.N. Venables. An Introduction to R
Hadley Wickham & Garrett Grolemund (2017). R for Data Science
Hadley Wickham. Advanced R
Winston Chang: R Graphics Cookbook, 2nd
Christoph Hank: Introduction to econometrics with R
Neale Batra: R for applied epidemiology and public health
David Dalpiaz: Applied Statistics with R (HTML version) GitHub
Colin Gillespie. Efficient R programming
Hadley Wickham (2015). R Packages
Yihui Xie. bookdown: Authoring Books and Technical Documents with R Markdown
Yihui Xie. R Markdown: The Definitive Guide
Julia Silge: Text Mining with R
Patrick Burns (2011). The R Inferno
Daniel Navarro. Learning Statistics with R
Trevor Hastie, Robert Tibshirani, Gareth James, Daniela Witten: An Introduction to Statistical Learning, with Applications in R 2nd. (pdf) with the excellent self-paced video training course. (here is the 15-hours of video training video abstracted by the Data School (YouTube))
Trevor Hastie (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Norman Matloff. The Art of R Programming (part)
Michael Crawley (2007). The R Book
Bolker (2007).Ecological models and data in R (2007 draft). Appendix (w/ delta method)
Winston Chang. Cookbook for R
Verzani. simpleR - Using R for Introductory Statistics
Kerns. Introduction to Probability and Statistics Using R
Peng. R Programming for Data Science, The Art of Data Science, Exploratory Data Analysis with R
Yakir. Introduction to Statistical Thinking (With R, Without Calculus)
Aragón. Population Health Data Science with R
赵鹏, 谢益辉, 黄湘云现代统计图形 (Modern Statistical Graphics)

Thursday, March 31, 2016

how to configure R! environment

Where/how to configure R start-up environment
There are several approaches can be used to customize the R working environment such as options and library directory etc. at R start-up:

Modify the R original profile file directly. The "Rprofile.site" is under the directory ".\R directory name\etc\". At both startup and end, the R will use the "Rprofile.site" file, then looks for the user-defined ".Rprofile" file in the current working directory (run "getwd()" to find the current location of working directory) or in the user's R home directory (run "R.home()" or "Sys.getenv("R_HOME")"to find where it is). You can edit the "Rprofile.site" file or create a ".Rprofile" file to customize the startup. For more information see Initialization at start of an R session and Customizing startup. I am using R-Portable and prefer to create a ".Rprofile" in the same directory of "R-Portable.exe" file. In such way, I don't need to dig deep and edit the R original setting.

to lists all the options can be set, run "names(options())"
to show the value of an item, run "options("option name")", for example:

"options("digits")" shows "$digits, [1] 7", which means the number will be shown 7 digits.
"options("defaultPackages")" shows the packages attached by default when R starts up

to modify the values of an option item, run "options(xxx=yyy), for example:

"options(digits=15)" changes the digit number into 15. Notes: this is for setting full length of number but not number of decimal places. To set the number of decimal, try such as "round(4/3, digits=2)" with 2 decimal places but not in "options()" unfortunately.

to set the directory of personal R library, create a ".Rprofile" file in the working directory and include ".libPaths(c(.libPaths(),"c:/myRlib directory name")", save it.
or, edit "Rprofile.site", add line: Add line: ".libPaths(c(.libPaths(),"c:/myRlib directory name")"

When use RStudio as the IDE, modify the options file ("Options.R") under the ".\Rstudio directory name\R\". The option setting overwrites the option setting in R profiles both "Rprofile.site" and ".Rprofile".

to set the directory of personal R library, edit file "Options.R", add line: ".libPaths(c(.libPaths(),"c:/myRlib directory name")", then save the "Options.R".
or, to use ".Rprofile", this file needs be in the working directory when not in a project (to set this master working directory using RStudio GUI: tools -> Global options... -> change the "Default working directory(when not in a project):"). Also you can change R.home() under the "R version:".

By the way, the options and the directory of package library can also be changed after the start-up of R.
de Vries (2015).Best practices for handling packages in R projects
Gillespie. R startup

Saturday, February 27, 2016

Doing Basic Calculus Using R!

Doing Basic Calculus Using R!
Differentiation Rules/Rules for Calculating Derivatives

Constant: f'(c) = 0, here c as a constant
Scalar Multiple: f'[cf(x)] = cf'(x)
Sum and Difference: [f(x) ± g(x)]' = f' (x) ± g' (x)
Product: [f(x) * g(x)]' = f'(x) * g(x) + f(x) * g'(x)
Quotient: [f(x) / g(x)]' = [g(x) * f'(x) - f(x) * g'(x)] / g(x)²
Power: f'(xⁿ) = n * x^(n-1)
Chain Rule: [f(g(x)]' = f'(g(x)) * g'(x)
Exponential: f'(e^x) = e^x Arbitrary base: f'(b^x) = b^x * lnb
Logarithmic: f'(ln|x|) = 1/x Arbitrary base: f'_b(logx) = 1/(x lnb)

Calculating Derivative and Integration Using R!

R can symbolically find the derivative of any function by using the function D() with function expression(). R knows how to use the chain rule as well.

First derivative: D(expression(x^2), "x") ===> 2 * x
Higher derivative: D(D(expression(x^2),"x"), "x") ===> 2
Partial derivative: D(expression((y-x)/y),"x") ===> -(1/y) and D(expression((y-x)/y),"y") ===> 1/y - (y - x)/y^2, which is equal to x/y^2
with the eval() function, you can get the value using particular values of its parameters: x =10; eval(D(expression(x^2), "x")) ===> 20
D(expression(pnorm(x)),"x") ===> dnorm(x)
D(expression(dnorm(x)),"x") ===> -(x * dnorm(x))

R can numerically perform one dimentsional integration using function integrate()

integrate(dnorm,-Inf,Inf) ===> 1 with absolute error < 9.4e-05
integrate(dnorm,-2.58,2.58) ===> 0.99012 with absolute error < 1.9e-08
integrate(function(x) {x^3 + x}, 0, 1) ===> 0.75 with absolute error < 8.3e-15

Other differentiation related R packages

Deriv is for symbolic differentiation.
Ryacas allows R users to access the yacas computer algebra system that does an excellent job of differentiation.

Use R to Compute Numerical Integrals

Online Calculators:

Taylor's Series is a series expansion of a function near a point. A real function f(x) which is close to a point a can be estimated as:

f(x) =∑(f⁽ⁿ⁾(a)/n! * (x - a)ⁿ
If a = 0, the expansion is known as a Maclaurin series.
Mathematical Annotation to write math symbols and expressions in R graphics (cheat sheet).

黄湘云 (2016).R语言做符号计算
R! How can I include Greek letters in my plot labels?

Wednesday, February 10, 2016

accept-reject algorithm

Accept-reject algorithm
Accept-reject algorithm (acceptance-rejection method) or reject sampling is a simple and general simulation method to decide observations with or without a trait from the probability of a distribution. In this way, we can convert a probability into a dichotomous condition (i.e. yes or no). Basically, there are three steps:

Step 1. Generate Y from density g [Y = f(x), the pdf of f(x) is the target distribution]

Sample a point (an x-position) from the proposal density distribution (g) and draw a vertical line at this point, get the density (an y-position) [X ~ g(x)]. The density function of Y has a upper, a constant c, and c is >=1.

Step 2. Generate U from the uniform distribution on the interval (0, cg(x)) [U = cg(x), the pdf of cg(x) is the proposal distribution]

Sample uniformly along in the range of x-position (i.e. uniformly from 0 to the maximum of the probability density function) [U ~ runif(0, 1)]

Step 3. If U <= Y, then set Y = X ("accept"), else repeat Steps 1 and 2

Pr(X|accept) = Pr(accept|X) x Pr(X)/Pr(accept), using Bayes' theorem
Pr(accept|X) = f(x)/cg(x)
Pr(X) = g(x)
Pr(accept) = 1/c
therefore, Pr(X|accept) = f(x)

Example: Stata simulation and define the event

clear
 set seed 770488
 set obs 1000

 gen x = runiform() - .5
 gen z = runiform() - .5
 gen xb = x + 8*z

 gen y = 1 / (1 + exp(xb)) < runiform() // y defined as 0 or 1 
 logistic y x z

Monday, December 21, 2015

ggplot2 2.0.0

ggplot2 2.0.0

I have used the ggplot2 package for a while and really like this package. It's happy to see that Hadley Wickham has officially updated the ggplot2 to version 2.0.0. On the RStudio Blog, Hadley highlights several important changes:

ggplot2 now has an official extension mechanism.
There are a handful of new geoms, and updates to existing geoms.
The default appearance has been thoroughly tweaked so most plots should look better.
Facets have a much richer set of labelling options.
The documentation has been overhauled to be more helpful, and require less integration across multiple pages.
A number of older and less used features have been deprecated.

You can find the document/manua on the project website. Many times, I go to the dev website to find the latest document/vignettes (extension, aesthetic specifications, themes).

The R Graphics Cookbook by Winston Chang is a must-have book to learn and become an expert of ggplot2 user. You can find the codes here from the Cookbook-R, and the Google book here.

Tuesday, December 15, 2015

general linear models vs. generalized linear models

General linear models vs. generalized linear models

Wikipedia: Comparison of general and generalized linear models (modified)

	General linear model	Generalized linear model
Typical estimation method	Least squares, best linear unbiased prediction	Maximum likelihood or Bayesian
Special cases	ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, mixed model, t-test, F-test	linear regression, logistic regression, Poisson regression, gamma regression
Function in R	lm()	glm()
Function in Matlab	mvregress()	glmfit()
Procedure in SAS	PROC GLM, PROC MIXED	PROC GENMOD (PROC LOGISTIC for logistic regression only), PROC GLIMMIX Comparing the MIXED and GLIMMIX
Command in Stata	regress	Stata Generalized linear models
Function in Mathematica	LinearModelFit	GeneralizedLinearModelFit
Command in EViews	ls

Generalized linear models have the flexiblility for response variables that have other than a normal distribution. If a generalized linear model uses an identity link function and a normal family distribution, then this model is equivalent to a general linear model.
Generalized linear mixed models have the flexibility to model random effects and correlated errors for nonmormal data.

non-probability sample

Non-Probability Sample

Definition

Reflection

Wretman (2010) Reflections on Probability vs Nonprobability Sampling (pdf)
Doherty (1994) Probability versus Non-Probability Sampling in Sample Surveys (pdf)

Video

Washington Statistical Society Mini-Conference Presentations (9/9/2015, Vimeo)

Friday, November 20, 2015

All-cause mortality was increasing among US middle age whites

All-cause mortality was increasing among US middle age Whites

Title: Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century
Authors: Anne Case and Angus Deaton
Abstract: This paper documents a marked increase in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change reversed decades of progress in mortality and was unique to the United States; no other rich country saw a similar turnaround. The midlife mortality reversal was confined to white non-Hispanics; black non-Hispanics and Hispanics at midlife, and those aged 65 and above in every racial and ethnic group, continued to see mortality rates fall. This increase for whites was largely accounted for by increasing death rates from drug and alcohol poisonings, suicide, and chronic liver diseases and cirrhosis. Although all education groups saw increases in mortality from suicide and poisonings, and an overall increase in external cause mortality, thosewith less education saw the most marked increases. Rising midlife mortality rates of white non-Hispanics were paralleled by increases in midlife morbidity. Self-reported declines in health, mental health, and ability to conduct activities of daily living, and increases in chronic pain and inability to work, as well as clinically measured deteriorations in liver function, all point to growing distress in this population. We comment on potential economic causes and consequences of this deterioration. Full text: PNAS

Related articles:

Epimonitor: All Cause mortality rate takes surprising upward turn for middle age Whites in the US

The Washington Post: Prestigious medical journals rejected stunning study on deaths among middle-aged whites

The New York Times: Death rates rising for middle-aged White Americans, study finds

Vox: A big study found spiking death rates for middle-aged white Americans. Critics say it's more complicated

New York: Anne Case, Co-author of the White Mortality Paper, Responds to a New Critique of Its Approach to Gender (Updated)