Epidemiology and Beyond

Wednesday, April 05, 2017

R! related

R! related

Blog: R Documentation and Learning Resources

IDE and GUI

RStudio is an IDE for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R. RStudio Webinars
Rcmdr is a GUI of R.
esquisse is a great addin for ggplot2
Deducer is a good but relative old GUI for exploring data like JMP with ggplot2 behind (Plot Builder). JGR is a Java GUI for R. (ggplot2 – much easier with JGR and Deducer). To use Deducer, you need - install.packages(c("JGR", "Deducer", "DeducerExtras")) -, submit: - library(JGR) -, - JGR() -; then, in the JGR console, to load Deducer, go to 'Packages & Data' > 'Package Manager' and select Deducer and DeducerExtras. more: all-in-one installing JGR and Deducer. Notes: 1) to set JAVA location using --- options(java.home="xxx/Java/") ---. 2) rJAVA ver 9.6 needs running under the 32 bit R! on my computers.
GrapheR (pdf) is another GUI for draw customized graphs without knowing any R commands.
Tessera - Open source environment for deep analysis of large complex data (Divide and recombine）
The application Bio7 is an integrated development environment for ecological modelling and contains powerful tools for model creation, scientific image analysis (ImageJ) and statistical analysis.

Communication between SAS and R

haven allows you to load foreign data formats (SAS, SPSS and Stata).

SAS calls functions in R language (pdf): RUN ExportDatasetToR(“xxx.mydata”); PROC IML; SUBMIT /R; ... R codes ... ENDSUBMIT; QUIT; RUN ImportDataSetFromR("WORK.xxx", "xxx");

Bewerunge (2011). Call R function from SAS (pdf)

Graphical parameters

Plot symbols

plot symbols (pch), size

Color

Pick a color site built in R with Shiny tags %$%

R Color Cheat Sheet, Color, palette (ggplot2: ggplot2 color, Cookbook for R), chart of R! colors.

RColorBrewer provides palettes for drawing nice maps shaded according to a variable.

Wesanderson provides palettes derived from the Tumblr blog Wes Anderson Palettes.

Simple Statistics: Colors in R

Glynn (2007). Using color in R

R! How can I include Greek letters in my plot labels?

Revolutions: How to make a heat map in R, Superheat: supercharged heatmaps for R

Chart Chooser — improves Excel and PowerPoint charts. there is R! version of Chart Chooser (not many charts on the site, but the idea is great)

Packages (rdrr.io, CRAN, Rdocumentation, Inside-R, Quick-R, Bioconductor)

CRAN Task View organizes the packages into different groups such as Graphics, Survival Analysis, etc.
margins: An introduction to 'margins'
Transition from Excel to R!:

DT: An R interface to the DataTables library
excelR: An R interface to jExcel library to create web-based interactive tables and spreadsheets compatible with Excel or any other spreadsheet software
DTedit: Editable DataTables for shiny apps
rhandsontable is a htmlwidget based on the handsontable.js library.
formattable is designed for applying formatting on vectors and data frames
rpivotTable is a R wrapper for the great library pivottable

Zelig is a general purpose statistics program fro estimating, interpreting, and presenting results from any statistical method. It turns the power of R with free ranging syntax, diverse examples, and documentation written for different audiences — into the same three commands and consistent documentation for every method
Report and documentation

knitr package was designed to generate dynamic report with R. Chunk options.
rmarkdown is an authoring format that enables easy creation of dynamic documents, presentations, and reports from R.
bookdown is to facilitate writing books and long-form articles/reports with R Markdown. Here are several very good free books published on the bookdown website

Programming

Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation.

Data I/O and manipulation

data.table package provides an enhanced version of data.frame. Here is the manual and help. It's a star package on r-pkg.org.
rio is a Swiss-army knife for data I/O.
readr makes it easy to read many types of tabular data. -read_fwf()- is a powerful function for reading fixed width file, for example, read in the NCHS mortality file.
Janitor has simple functions for examining and cleaning dirty data
sqldf is an R package for running SQL statements on R data frames. sqldf supports SQLite database as default (SQLite syntax).

haven allows you to load foreign data formats (SAS, SPSS and Stata).

xlsx gives programatic control of Excel files using R.

read.table() reads a file in table format and creates a data frame from it.

arrow supports for analyzing large, multi-file datasets, working with individual Parquet and Feather files.

MMWRweek: Convert dates to MMWR Day, Week, and Year

dlookr: Diagnose, explore and transform data

foreign read data stored by other system.

dplyr is a new package which provides a set of tools for efficiently manipulating datasets in R. Introduction to dplyr. Notes: "Both data.table and dplyr were able to reduce the problem to less than a few seconds. If you’re looking for pure speed data.table is the clear winner. However, it is my understanding that data.table‘s syntax can be frustrating, so if you’re already used to the ‘Hadley ecosystem’ of packages, dplyr is a formitable alternative, even if it is still in the early stages."

tidyr with spread() and gather(), a reframing of reshape2, is a new package that makes it easy to “tidy” your data.
magrittr is a forward pipe operator. magrittr: Simplifying R code with pipes. History of Magrittr.
Aggregation and restructure: reshape2 package with melt() or cast() function.Transpose using t() function, aggregating data using aggregate() function.
lubridate is good for manipulating time and date.
Stringr makes it easier to work with strings (article).
Quick-R examples of outputing R data.
RStudio addins: datapasta, ggthemeassist
pdftools: Text Extraction, Rendering and Converting of PDF Documents (example)

Survey related packages

survey (Thomas Lumley) provides facilities in R for analyzing data from complex surveys. Complex sampling and R (pdf). Analyze US Government Survey Data with R. You can use "survey" to create replicate weights, and use withReplicates computes variances by replicate weighting.
rpms is for Recursive Partitioning for Modeling Survey data
lavaan.survey is for complex survey structural equation modeling (SEM). An R Package for Complex SurveyAnalysis of Structural Equation Models (pdf)
sampling includes many different algorithms for drawing survey samples and calibrating the design weights.
convey is a package for estimating indicators of income concentration and poverty (including gini coefficient, atkinson index, at-risk-of-poverty threshold etc.) wrapped around the Lumley's survey package by Anthony Damico. He has worked "analyze survey data for free with R language" for years (asdfree).

GIS

Pebesma: Spatial Data Science with application in R
Methods Bites: Using Geospatial Data in R
Pebesma: Simple Features for R
R-spatial.org: Drawing beautiful maps programmatically with R, sf and ggplot2, Part 1, Part 2, Part 3
geofacet: Introduction to geofacet
ggmap is the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2. German Gas Prices illustrated, Cheatsheet

Robin. Introductory tutorial on graphical display of geographical information in R
Other packages: maps (draw geographical maps), maptools (reading and hadling spatial objects), mapdata (extra map databses), scales (scale functions for visualization), mapproj (map projections), RgoogleMaps (overlays on static maps)

tidycensus is an R package that allows users to interface with the US Census Bureau’s decennial Census and five-year American Community APIs

Data Visualization and Graphics

The R graph Gallery
Auguie: Laying out multiple plots on a page
patchwork: to arrange your graphs easily

ggplot2 (document) (You can build the book here: ggplot2: Elegant graphics for data analysis) is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. Cookbook for R is a great book. Tutorial websites: Tutorial I, Tutorial II. A good book website of Guide to Create Beautiful Graphics in R (Cheatsheet: Be Awesome in ggplot2). Beautiful plotting in R: A ggplot2 cheatsheet (pdf). Tutorial video (Part I and Part II) by Roger Peng

ggplot2 extensions gallery
ggThemeAssist is a RStudio-Addin and provides a GUI for editing ggplot2 themes
esquisse is a addin, which allows you to interactively explore your data by visualizing it with the ggplot2 package.
cowplot and ggstatsplot are add-on packages of ggplot2
ggpubr provides some functions for creating figures based on publication requirement
You may use ggplot_gtable() to plot data generated by ggplot_build() to find the plot objects (grob) (grobs <- font="" ggplot_build="" ggplot_gtable="" yourgraph="">), then use the grid.draw() of grid package to draw them individually. Hadley Wickham has blogs about how to Mixing ggplot2 graphs with other graphical output, Editing raw Grid objects from a ggplot.
ggedit is used to facilitate ggplot formatting

A conversation with Hadley Wickham 2014 - Eduardo
ggthemes is a package including some extra geoms, scales, and themes styles (the Economist, Excel, Stata, etc.) for ggplot2.
GGally is designed to be a helper to ggplot2. It contains templates for different plots to be combined into a plot matrix, a parallel coordinate plot function, as well as a function for making a network plot.
gtable is a package of tools to make it easier to work with ``tables'' of grobs, which is internally used by ggplot2. I used the gtable::cbind(..., size = "max", z = NULL) or gtable::rbind(..., size = "max", z = NULL) to match the size of plots.
ggprepel is a ggplot2 extension to avoid overlapping text labels.
Walker (2014). International population pyramids with ggplot2

ggforce accelerates 'ggplot2'
ggraph is an extension of ggplot2 tailored at plotting graph-like data structures (graphs, networks, trees, hierarchies...)
WVPlots Provides examples of ggplot plots that can be generated from a standard calling interface
classifierplots generates a visualization of binary classifier performance as a grid of diagonstic plots with just one function call
ggalt: amke a dumbbell plot in ggplot2

ggcharts aims to get you to your desired plot faster.
Plotly: ggplotly of plotly package is a powerful tool to convert ggplot2 plots and create interactive, online ggplot2 charts with D3.js. Carson Sievert used this converter recreated Hadley Wickam’s entire ggplot2 documentation (here). Click-drag to zoom, shift-click to pan, double-click to autoscale. It's really amazing. Plotly Tutorial: Plotly and R

grid and gridExtra are useful combined with ggplot2, for example:

library(grid) # for unit() in ggplot2
grid.arrange(p1, p2, p3, p4,ncol=2)

ggvis new package for data visualization. Like ggplot2, it is built on concepts from the grammar of graphics, but it also adds interactivity, a new data pipeline, and it renders in a web browser.
vcd for visualizing categorical data. Working with categorical data with R and the vcd and vcdExtra packages and Visualizing Categorical Data with SAS and R by Michael Friendly.
Distribution visualization: vioplot for violin plot (examples of boxplot and violin plot). Hintze (1998). Violin Plots: A Box Plot-Density Trace Synergism. Violin plot using ggplot2. Boxplots and beyond: Part I, II: asymmetry, III: violin plots, IV: beanplots.
venneuler is my favorite package for make a area proportional Venn and Euler Diagram simple and fast, I use it create diagrams then use Inkscape to modify them. A more complicated package is VennDiagram. Wilkinson (2012) introduced his venneuler package in the artilce "Exact and Approximate Area-proportional Circular Venn and EulerDiagrams". Chen (2011) compared some R! package and programs in the article "VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R" (Create a Venn diagram using MS Excel)

venn = venneuler(c(a1c=.15, fpg=.11, ogtt=.13, 'a1c&fpg'=.1, 'fpg&ogtt'=.13, 'a1c&ogtt'=.4, 'a1c&ogtt&unknow'=.1))
plot(venn)

Vennerable package provides routines to compute and plot Venn diagrams, including the classic two- and three-circle diagrams but also a variety of others with different properties such as rectangular or square Venn diagram and for up to seven sets. It's not available at CRAN. You can install this package using: install.packages("Vennerable", repos = "http://R-Forge.R-project.org")
googleVis - Google Motion Charts with R!
colourlovers provides access to the COLOURlovers API, which offers color inspiration and color palettes
Lattice is a powerful and elegant high-level data visualization system, with an emphasis on multivariate data.
scatterplot3d provides routines for the visualization of multivariate data in a three dimensional space.
iPlots is a package for the R! which provides high interaction statistical graphics, written in Java.
rcharts is an R package to create, customize and publish interactive javascript visualizations from R using a familiar lattice style plotting interface.
wordcloud2 and wordcloud create wordcloud for data visualization. Using R to create a Word Cloud from a PDF Document. 可能是目前最好的词云解决方案wordcloud2
David Smith (2017).Packages to simplify mapping in R
magick is a package of wrapping up of ImageMagick which is an image processing library
networkD3: D3 JavaScript Network Graphs from R, good at Sankey Diagram

Missing Imputation

Amelia: A Program for Missing Data (missmap())
Berk (2021): Handling missing data using XGBoost, XGBoost
Josse: Handling missing values with R
MissingDataGUI: a GUI for missing data exploration
MICE: multiple imputation using fully conditional specification.

Statistics/Machine Learning

MLR – Machine Learning in R and MLR3 a scussor of MLR. Here is the mlr3 book
tidymodes: a combination of multiple packages: parsnip, recipes, etc.

tidyverse is a collection of popular packages for data munging from Hadley Wickham including ggplot2, dplyr, tidyr, readr, purrr, tibble, etc.

bayestestR provides a comprehensive and consistent set of functions to analyze and describe posterior distributions generated by a variety of models objects, including popular modeling packages such as rstanarm, brms or BayesFactor.
rms: is a package goes along with the book Regression Modeling Strategies
caret is a set of functions that attempt to streamline the process for creating predictive models. Here are caret webinar and slides
splines is now a part of R! standard/add-on packages/bundle. splines::ns() generate a matrix for natural cubic splines. Bendix tuned the ns() as Ns() in his Epi package, which used the smallest and the largest of the supplied knots as boundary knots. You can download the latest version on his website here.
lavaan: latent variable analysis
msm::deltamethod: delta method, FAQ UCLA
nls can be used to determine the nonlinear (weighted) least-squares estimates of the parameters of a nonlinear model, which is similar to the 'nl' of stata and 'proc nlin' of SAS
rpart is for the recursive partitioning for classification, regression and survival trees.

Therneau (2015).An introduction to recursive partitioning using the RPART routines
Classification Trees using the rpart function (2010)
Let's have a "party" and tear this place "rpart"! (2013)

Hmisc: Harrell Miscellaneous
convey estimates measures of poverty, inequality, and wellbeing using the complex survey data
deltavar() is a function in the emdbook packages of the book, which calculates delta-method-based variances for functions with any number of parameters

deltavar(log(A/B),meanval=c(A=0.8,B=12),Sigma=matrix(c(0.1,0,0,8.0),nrow=2)) ---> 0.2118056 = ((1/A)^2*Var(A)+(1/B)^2*Var(B))^0.5
deltavar(log(A/B),meanval=c(A=0.8,B=12),Sigma=c(0.1,8.0)) -> 0.2118056

MASS is a classic package for Venables and Ripley's Modern Applied Statistics with S
Bolstad is A set of R functions and data sets for the book Introduction to Bayesian Statistics, Bolstad, W.M. (2016). The "bayes.lin.reg" function may be used for combining the two estimates like meta-analysis.
Psych: Using R for psychological research

Electronic Health/Medical Record (EHR) related

coder: Deterministic Categorization of Items Based on External Code Data by rOpenSci (blog)

Friday, March 24, 2017

R functions and keyboard shortcuts

R functions/commands and keyboard shortcuts

R! is powerful and has rich packages and functions. It's impossible to build a list of functions/shortcuts to fit the purposes of all. Below are some functions/shortcuts related to my projects.

Cheatsheets, The R Guide, R Reference Card
Help functions: help()/?, apropos(), find(): apropos() finds all objects. find() the locations of found objects, methods(), example(), demo(), vignette(), args()
Housekeeping functions: getwd(), setwd(), rm(list=ls()) removes all objects in the R environment, source("myRscript.r") runs the R codes in "myRscript.r" file, fix() modifies the original object, and edit() is used edit an object and returns to a new object, download.file() downloads a file from the Internet, attach()/detach() objects, search() shows the current search paths and sequence, install.packages(), update.packages(), remove.packages(), getOption("defaultPackages") which can be changed by setting the option in startup code (e.g. in ~/.Rprofile), .libPaths()

Numeric/character functions: length(), seq(), rep(), cut(), pretty(), cat(), substr(), grep(), sub(), strsplit(), paste(), toupper(), tolower()
Data functions: read.table(), head(), tail(), str(), class(), length(), dim(), nrow(), ncol(), names(), levels(), length(), c(), cbind(), rbind(), append(), rep(), rev(), sort(), unique()
Type functions: "is." for checking or "as." for conversion + numeric(), character(), vector(), matrix(), data.frame(), factor(), logical(), integer(). For example: is.numeric(), as.numeric()
Mathematical functions: abs(), sqrt(), log(), log(x, base=n), log10(), exp(), prod(), factorial(), choose(), ceiling(), floor(), solve(), trunc(), round(), signif(), cos(), sin(), tan(), acos()
Statistical functions: mean(), median(), sd(), var(), mad(), quantile(), range(), sum(), diff(), min(), max(), scale(), fivenum(), cumsum(), cumprod(), cummax(), cumin(), cor(), colSums(), rowSums(), colMeans(), rowMeans()
Probability functions: the form is [d][p][q][r]distribution(). d, p, q, r are for (d)ensity, cumulated (p)robability/distribution function, (q)uantile function, and (r)andom generation, respectively. the Distribution types can be: (norm)al, (beta), (binom)ial, (chisq)uared, (exp)onential, (logis)tic, (multinom)ial, (n)egative (binom)ial, (pois)son, (f), (gamma), (t), (unif)orm, etc. for example: dnorm(), pnorm(), qnorm(), rnorm()
Statistical modeling functions

Model functions: lm(), glm(), nls(), nls2(), lme() / nlme()
Symbol formulas (y ~ A + B + C ): ":" is for interaction term, "*" is for complete interaction, "^" is for crossing to a specified degree "." is a placeholder for all other variables except the dependent variable, "-" removes a variable from the equation, "-1" suppresses the intercept, "I()" has elements within the parentheses interpreted arithmetically

Post-estimation functions: coef(), confint(), resid(), fitted(), summary(), predict(), deviance(), print(),plot(), formula(), anova(obj1, obj2), AIC(), vcov()
Contrast functions: contr.helmert(), contr.poly(), contr.sum(), contr.treatment(), contr.SAS()

RStudio is an integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R. Shortcuts (you can modify them: Tools -> Modify Keyboard Shortcuts...)

Alt + Shift + K: Show a Quick Reference
Alt + -: Insert assignment operator "<- font="">
Ctrl + Shift + M: Insert pipe operator "%>%" (I changed it as Ctrl + Shift + P)
Ctrl + Alt + I: Insert chunk (R Notebook/Markdown)
Ctrl + 1: Move cursor to source Editor window
Ctrl + 2: Move cursor to Command window
Ctrl + 3: Move cursor to Help window
Ctrl + 4: Move cursor to History window
Ctrl + 5: Move cursor to File window
Ctrl + 6: Move cursor to Plots window

Monday, March 13, 2017

choice of analytical language

Choice of analytical language
I have used mainly three statistical languages, Stata, R, and SAS, for many years for different purposes. The weights of usage of those three languages are shift from SAS-Stata-R to SAS-R-Stata, then, to Stata-R-SAS. Sometimes I am asked to recommend a better analytic language, which is always a hard and complicated question to me. I came across an blog written by Curtis Miller, which is very thoughtful and helpful to make this kind of choice. Here is his blog: "On Programming Languages; Why My Dad Went From Programming to Driving a Bus". Hopefully his story can help you to make your own decision.

Wednesday, March 08, 2017

Stata News: in the spotlight

Stata News: in the spotlight etc.

2021

2020

2019

Customized forest plots for displaying meta-analysis results
Importing data from SPSS and SAS
Fun with frames
Lasso
Interpreting models for log-transformed outcomes (unbiased prediction: E(Y|X) = e^XBe^σ²/2 )
User's corner: ftools and gtools

2018

2017

In the spotlight: Nonlinear multilevel mixed-effects models
Cheatsheet: User's Corner: Quick references for your favorite commands
What's new in Stata 15 (released on 2017-06-06, 15.1 released on 2017-12-20)
Visualizing continuous-by-continuous interactions with margins and twoway contour

2016

2015

Easy-to-interpret, flexible survival-time treatment effects, and Postestimation Selector
Treatment effects, and irt
Bayesian “random-effects” models, and What's New in Stata 14
Finding and using results, constants, functions ... anything (Data > Other utilities > Hand calculator), and forecast for dynamic panel data and counterfactuals

2014

2013

2012

2011

2010

Friday, March 03, 2017

Syndemics: health in context

Syndemics: health in context
A syndemic, coined by Merrill Singer in mid-1990s, is a conceptual framework for understanding diseases or health conditions that arise in populations and that are exacerbated by the social, economic, environmental, and political milieu in which a population is immersed. The today's issue of Lancet published a series related the syndemic... full text ...

Tuesday, January 24, 2017

Information about the Global Burden of Diseases, Injuries, and Risk Factors Study

Information about the Global Burden of Diseases, Injuries, and Risk Factors Study

WHO: the Global Burden of Disease (GBD) project
Lancet Global Burden of Disease
UW website directed Christopher Murray
Wikipedia: History of GDB
List of causes and ICD9 and ICD10 of these causes in eTable 2 of supplement
Search title with "Global Burden of Disease Study" on the PubMed

Tuesday, January 03, 2017

Using multi-year national survey cohorts for period estimates: an application of weighted discrete Poisson regression for assessing annual national mortality in US adults with and without diabetes, 2000-2006.

Cheng YJ, Gregg EW, Rolka DB, Thompson TJ.

BACKGROUND:

Monitoring national mortality among persons with a disease is important to guide and evaluate progress in disease control and prevention. However, a method to estimate nationally representative annual mortality among persons with and without diabetes in the United States does not currently exist. The aim of this study is to demonstrate use of weighted discrete Poisson regression on national survey mortality follow-up data to estimate annual mortality rates among adults with diabetes.

METHODS:

To estimate mortality among US adults with diabetes, we applied a weighted discrete time-to-event Poisson regression approach with post-stratification adjustment to national survey data. Adult participants aged 18 or older with and without diabetes in the National Health Interview Survey 1997-2004 were followed up through 2006 for mortality status. We estimated mortality among all US adults, and by self-reported diabetes status at baseline. The time-varying covariates used were age and calendar year. Mortality among all US adults was validated using direct estimates from the National Vital Statistics System (NVSS).

RESULTS:

Using our approach, annual all-cause mortality among all US adults ranged from 8.8 deaths per 1,000 person-years (95% confidence interval [CI]: 8.0, 9.6) in year 2000 to 7.9 (95% CI: 7.6, 8.3) in year 2006. By comparison, the NVSS estimates ranged from 8.6 to 7.9 (correlation = 0.94). All-cause mortality among persons with diabetes decreased from 35.7 (95% CI: 28.4, 42.9) in 2000 to 31.8 (95% CI: 28.5, 35.1) in 2006. After adjusting for age, sex, and race/ethnicity, persons with diabetes had 2.1 (95% CI: 2.01, 2.26) times the risk of death of those without diabetes.

CONCLUSION:

Period-specific national mortality can be estimated for people with and without a chronic condition using national surveys with mortality follow-up and a discrete time-to-event Poisson regression approach with post-stratification adjustment. (Full text)

Wednesday, November 30, 2016

Use French cleat to hold things

Wikipedia. What is the French cleat?
Popular Mechanics. How to Build a French Cleat Shelf to Hold Virtually Anything
Family Handyman. Custom Garage Storage (Video)

Tuesday, November 29, 2016

Interview with J.J. Allaire

Interview with J.J. Allaire - the founder of RStudio
by Joseph Rickert
Welcome to “R Views”, the new R Community blog from RStudio. For this first post, I sat down with J.J. Allaire, RStudio’s founder and CEO, to discuss RStudio’s history, its mission and JJ’s vision for its future. In a short time, we touched on a wide range of subjects including RStudio’s business, the growth of the R language, the importance of the R Consortium to the R Community and J.J.’s advice to anyone coming to R for the first time. We hope you enjoy this “snapshot” of RStudio’s place in the R world. full text
You can also read a Chinese version here.

Thursday, October 13, 2016

'Big Fat Fix' Film Challenges Mediterranean Diet

'Big Fat Fix' Film Challenges Mediterranean Diet
An Interview With Cardiologist Aseem Malhotra
Editor's Note: Cardiologist Aseem Malhotra, MBChB, MRCP, talks about his new documentary The Big Fat Fix, which sent him to Pioppi, Italy, the village where Ancel Keys researched diet and cardiovascular health. A regular contributor to the BMJ and major UK newspapers on the topic of dietary health, Dr Malhotra believes that the demonization of fat let sugar off the hook as the real culprit in the diabetes, obesity, and cardiovascular disease epidemic, and that we need to rethink our approach to exercise. ... Full Text.

This article is an another interesting opinion based on facts and viewed from a different angle. This interview reminds me the Michael Pollan's book In Defense of Food published in 2008: Food – Not Nutrients – Is The Fundamental Unit In Nutrition. (PBS Documentary In Defense of Food in Dec. 2015, PBS Newshour and on YouTube).
Food Insight (2015). 4 Food Rules You Won’t Find in Michael Pollan’s ‘In Defense of Food’

Wednesday, October 12, 2016

Microbiome: Fibre for the future

Microbiome: Fibre for the future
Nautre: Eric Martens
A chronic lack of dietary fibre has been found to reduce the diversity of bacteria in the guts of mice. This effect is not fully reversed when fibre is reintroduced, and increases in severity over multiple generations. ... Full text

Battle of the data science Venn Diagrams

Battle of the Data Science Venn Diagrams
by David Taylor

Data science is a rather fuzzily defined field; some of the definitions I've heard are:

"Work that takes more programming skills than most statisticians have, and more statistics skills than a programmer has."
"Applied statistics, but in San Francisco."
"The field of people who decide to print 'Data Scientist' on their business cards and get a salary bump."

Personally, I've recently decided to avoid the controversy by calling myself a data spelunker. (Data miners are out of vogue anyway.)
As a field in search of a definition, it's unsurprising that you can find a lot of different attempts to define it.
As a field full of data nerds with a penchant for visualization, it's also unsurprising that a lot of them use Venn diagrams. (Fun fact: John Venn, who invented the eponymous diagrams, and his son filed a patent in 1909 for an lawn bowling machine.)... Full Text

Saturday, October 01, 2016

Stata: Get out-of-sample file predictions

Stata: Get out-of-sample file predictions
Example:
webuse auto,clear

regress mpg weight foreign

est store regxb

preserve

webuse newautos,clear

est restore regxb

predict mpg

list

restore

Thursday, September 01, 2016

Stata: display system date

Stata: display system date

.di "system date:" c(current_date)
.di "system date:" "$S_DATE"
.di %td_CY-N-D date("`c(current_date)'","DMY") // "` '" are not necessary
.di %td_CY-N-D date("$S_DATE","DMY")
.di "system year: " year(date(c(current_date),"DMY") // w/o `' around c(current_date)
.di "system month: " month(date(c(current_date),"DMY"))
.di "system day:" day(date(c(current_date),"DMY"))
.di "system year: " year(date("$S_DATE","DMY"))
.di "system month: " month(date("$S_DATE","DMY"))
.di "system day:" day(date("$S_DATE","DMY"))
more examples:

Works ('local' with '=')

local dd=day(date(c(current_date),"DMY"))
local mm=month(date(c(current_date),"DMY"))
local yy=year(date(c(current_date),"DMY"))
log using "output_`yy'_`mm'_`dd'.log", replace
log close

Doesn't work ('local' without '=')

local dd day(date(c(current_date),"DMY"))
local mm=month(date(c(current_date),"DMY"))
local yy=year(date(c(current_date),"DMY"))
log using "output_`yy'_`mm'_`dd'.log", replace
log close // invalid 'DMY' r(198)

Works ('global' with '=')

global dd=day(date(c(current_date),"DMY"))
global mm=month(date(c(current_date),"DMY"))
global yy=year(date(c(current_date),"DMY"))
log using "output $yy-$mm-$dd.log", replace
log close

Sunday, July 31, 2016

Recycle/reuse returned results in Stata

Recycle/reuse returned results in Stata

UCLA: "How can I access information stored after I run a command in Stata (returned results)?"
The Stata Blog: Drukker (2015). Programming an estimation command in Stata: Where to store your stuff
Stackoverflow (2014).Saving coefficients and standard errors as variables
Lembcke (2009). Advanced Stata Topics
SSCC. An Introduction to Mata
Stata commands are grouped into 4 major categories: r-class, e-class, s-class, and n-class commands. Also a c-class contains the values of system parameters and settings, along with certain constants.
The commands produce the statistical results are either r-class or e-class. e-class commands produce the estimation results, others are belong to r-class.
After submitting "contrast", Stata generates a L matrix (r(L)), you can check the contrast coefficients using "matrix list r(L)".
If don't know what results are outputted, use "return list" or "ereturn list" to find them. The scalar results from a r-class can be used with the "r(...)" and scalar results from e-class command can be used with "e(...)". Here, "..." is the name showed using "return list" or "ereturn list". The use of results in matrix form is a little tricky. "_b[...]" or "_se[...]" have to be used; here, "..." is the variable name of a coefficient in the model. The results for a constant is used as "_b[_cons]" for beta coefficient or "_se[_cons]" for standard error. A matric results can also converted into a matrix: "mat B=e(b)", then "disp B[rowno, colon]".
To show variance-covariance matrix, use: "estat vce" or just simple "matlist e(V)", and to show correlation, use: "estat vce, correlation".
You can "estimate store" and "estimate restore" a set of estimates with a name in memory, in such way, the following command will not be erased. If want to save and use it as a permanent file, you can use "estimate save" and "estimate use".
A single number can been converted into scalar, for example, "scalar xyz=_b[agecat]". However, the scalar has to be used with a pseudofunction scalar(), for example, "display scalar(xyz)" (more info)
The e(V) and e(b) matrices can be converted into variables of a dataset using "svmat" (convert variables into matrix using "mkmat"), which is similar to "putmat and getmat" of mata (matrix ref.):

mat D = e(b)', e(b)'
svmat double D, name(coef)
mat se1=vecdiag(e(V))
mat se2=vecdiag(e(V))
mat SE = se1, se2
svmat SE, name(se)

The "ereturn display" can use the e(V) and e(b) matrices to return a r-class matrix "r(table)"

"margins" also gives e-class results:

webuse dollhill3,clear
poisson deaths i.smokes##c.agecat, exposure(pyears)
est store tempreg
margins smokes, gen(dhat) predict(ir) // undocumented gen()
mean dhat1 // for smokes = 0
scalar dhat1=_b[dhat1] // .00810452
margins smokes, eydx(agecat) predict(ir) post
scalar eydxsmokes0=_b[0.smokes] // 1.046826
est restore tempreg
margins smokes, dydx(agecat) predict(ir) post
scalar dydxsmokes0=_b[0.smokes] // .00848402
disp scalar(dydxsmokes0)/scalar(dhat1) // gives 1.046826

Gould(2010).Mata Matters; (2011).Mata, the missing manual. Baum(2009).Using Mata to work more effectively in Stata
putmat and getmat - Put Stata variables into Mata and vice versa

mata r2=(1\2\3)
mata b=st_matrix("e(b)")'
mata se=sqrt(diagonal(st_matrix("e(V)")))
getmata r2 b se, force
vwls b r2, sd(se)
reg b r2

Rename "rowname" and "colname" of a matrix

     program estmatrename, eclass
       matrix BB = e(b)
     matrix colnames BB = "1.race" "2.race" "3.race"
       ereturn repost b = BB, rename
      matrix VV = e(V)
      matrix colnames VV = "1.race" "2.race" "3.race"
      matrix rownames VV = "1.race" "2.race" "3.race"
      ereturn repost V = VV
     end

total heartatk [pw=swgt], over(race)
estmatrename
lincom (_b[3.race]-_b[1.race])/2
test _b[1.race]=_b[2.race]
contrast {race 1 -1 0}
contrast p(1).race

Convert ln(RR) into RR and percent change

webuse dollhill3
poisson deaths smokes i.agecat,exposure(pyears) irr margins agecat, predict(ir) post
qui nlcom (lnRR21:ln((_b[2.agecat]/_b[1.agecat])))(lnRR31:ln((_b[3.agecat]/_b[1.agecat]))) (lnRR41:ln((_b[4.agecat]/_b[1.agecat]))), post
ereturn disp,eform(RR) cformat(%5.2f) pformat(%5.4f)
mat rtable=r(table)'
mat RR=rtable[1...,"b"],rtable[1...,"ll".."ul"]
mata st_matrix("pctable",(st_matrix("RR"):-1):*100)
mat coln pctable=RR LL UL
matlist pctable, format(%10.2f)r

Monday, May 23, 2016

The 21 greatest graduation speeches of the last 60 years

Vox: The 21 greatest graduation speeches of the last 60 years
by German Lopez on May 11, 2016
"Graduation speeches are the last opportunity for a high school or college to educate its students. It's unsurprising, then, that these institutions often pull in some of the world's most powerful people to leave an equally powerful impression on their students. Here are the best of those speeches and some of the sections that resonate the most..." (May 11, 2016)
To read and watch the full article on the Vox website here.

Sunday, May 01, 2016

R! Books

R! Books

Oscar: Big Book of R collection
RStudio: Cheatsheets
W.N. Venables. An Introduction to R
Hadley Wickham & Garrett Grolemund (2017). R for Data Science
Hadley Wickham. Advanced R
Winston Chang: R Graphics Cookbook, 2nd
Christoph Hank: Introduction to econometrics with R
Neale Batra: R for applied epidemiology and public health
David Dalpiaz: Applied Statistics with R (HTML version) GitHub
Colin Gillespie. Efficient R programming
Hadley Wickham (2015). R Packages
Yihui Xie. bookdown: Authoring Books and Technical Documents with R Markdown
Yihui Xie. R Markdown: The Definitive Guide
Julia Silge: Text Mining with R
Patrick Burns (2011). The R Inferno
Daniel Navarro. Learning Statistics with R
Trevor Hastie, Robert Tibshirani, Gareth James, Daniela Witten: An Introduction to Statistical Learning, with Applications in R 2nd. (pdf) with the excellent self-paced video training course. (here is the 15-hours of video training video abstracted by the Data School (YouTube))
Trevor Hastie (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Norman Matloff. The Art of R Programming (part)
Michael Crawley (2007). The R Book
Bolker (2007).Ecological models and data in R (2007 draft). Appendix (w/ delta method)
Winston Chang. Cookbook for R
Verzani. simpleR - Using R for Introductory Statistics
Kerns. Introduction to Probability and Statistics Using R
Peng. R Programming for Data Science, The Art of Data Science, Exploratory Data Analysis with R
Yakir. Introduction to Statistical Thinking (With R, Without Calculus)
Aragón. Population Health Data Science with R
赵鹏, 谢益辉, 黄湘云现代统计图形 (Modern Statistical Graphics)