- Oscar: Big Book of R collection
- RStudio: Cheatsheets
- W.N. Venables. An Introduction to R
- Hadley Wickham & Garrett Grolemund (2017). R for Data Science
- Hadley Wickham. Advanced R
- Winston Chang: R Graphics Cookbook, 2nd
- Christoph Hank: Introduction to econometrics with R
- Neale Batra: R for applied epidemiology and public health
- David Dalpiaz: Applied Statistics with R (HTML version) GitHub
- Colin Gillespie. Efficient R programming
- Hadley Wickham (2015). R Packages
- Yihui Xie. bookdown: Authoring Books and Technical Documents with R Markdown
- Yihui Xie. R Markdown: The Definitive Guide
- Julia Silge: Text Mining with R
- Patrick Burns (2011). The R Inferno
- Daniel Navarro. Learning Statistics with R
- Trevor Hastie, Robert Tibshirani, Gareth James, Daniela Witten: An Introduction to Statistical Learning, with Applications in R 2nd. (pdf) with the excellent self-paced video training course. (here is the 15-hours of video training video abstracted by the Data School (YouTube))
- Trevor Hastie (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Norman Matloff. The Art of R Programming (part)
- Michael Crawley (2007). The R Book
- Bolker (2007).Ecological models and data in R (2007 draft). Appendix (w/ delta method)
- Winston Chang. Cookbook for R
- Verzani. simpleR - Using R for Introductory Statistics
- Kerns. Introduction to Probability and Statistics Using R
- Peng. R Programming for Data Science, The Art of Data Science, Exploratory Data Analysis with R
- Yakir. Introduction to Statistical Thinking (With R, Without Calculus)
- Aragón. Population Health Data Science with R
- 赵鹏, 谢益辉, 黄湘云 现代统计图形 (Modern Statistical Graphics)
Disclaimer: This blog site is intended solely for sharing of information. Comments are warmly welcome, but I make no warranties regarding the quality, content, completeness, suitability, adequacy, sequence, or accuracy of the information.
Sunday, May 01, 2016
R! Books
R! Books
Thursday, March 31, 2016
how to configure R! environment
Where/how to configure R start-up environment
There are several approaches can be used to customize the R working environment such as options and library directory etc. at R start-up:
There are several approaches can be used to customize the R working environment such as options and library directory etc. at R start-up:
- Modify the R original profile file directly. The "Rprofile.site" is under the directory ".\R directory name\etc\". At both startup and end, the R will use the "Rprofile.site" file, then looks for the user-defined ".Rprofile" file in the current working directory (run "getwd()" to find the current location of working directory) or in the user's R home directory (run "R.home()" or "Sys.getenv("R_HOME")"to find where it is). You can edit the "Rprofile.site" file or create a ".Rprofile" file to customize the startup. For more information see Initialization at start of an R session and Customizing startup. I am using R-Portable and prefer to create a ".Rprofile" in the same directory of "R-Portable.exe" file. In such way, I don't need to dig deep and edit the R original setting.
- to lists all the options can be set, run "names(options())"
- to show the value of an item, run "options("option name")", for example:
- "options("digits")" shows "$digits, [1] 7", which means the number will be shown 7 digits.
- "options("defaultPackages")" shows the packages attached by default when R starts up
- to modify the values of an option item, run "options(xxx=yyy), for example:
- "options(digits=15)" changes the digit number into 15. Notes: this is for setting full length of number but not number of decimal places. To set the number of decimal, try such as "round(4/3, digits=2)" with 2 decimal places but not in "options()" unfortunately.
- to set the directory of personal R library, create a ".Rprofile" file in the working directory and include ".libPaths(c(.libPaths(),"c:/myRlib directory name")", save it.
- or, edit "Rprofile.site", add line: Add line: ".libPaths(c(.libPaths(),"c:/myRlib directory name")"
- When use RStudio as the IDE, modify the options file ("Options.R") under the ".\Rstudio directory name\R\". The option setting overwrites the option setting in R profiles both "Rprofile.site" and ".Rprofile".
- to set the directory of personal R library, edit file "Options.R", add line: ".libPaths(c(.libPaths(),"c:/myRlib directory name")", then save the "Options.R".
- or, to use ".Rprofile", this file needs be in the working directory when not in a project (to set this master working directory using RStudio GUI: tools -> Global options... -> change the "Default working directory(when not in a project):"). Also you can change R.home() under the "R version:".
- By the way, the options and the directory of package library can also be changed after the start-up of R.
- de Vries (2015).Best practices for handling packages in R projects
- Gillespie. R startup
Saturday, February 27, 2016
Doing Basic Calculus Using R!
Doing Basic Calculus Using R!
Differentiation Rules/Rules for Calculating Derivatives
Differentiation Rules/Rules for Calculating Derivatives
- Constant: f'(c) = 0, here c as a constant
- Scalar Multiple: f'[cf(x)] = cf'(x)
- Sum and Difference: [f(x) ± g(x)]' = f' (x) ± g' (x)
- Product: [f(x) * g(x)]' = f'(x) * g(x) + f(x) * g'(x)
- Quotient: [f(x) / g(x)]' = [g(x) * f'(x) - f(x) * g'(x)] / g(x)2
- Power: f'(xn) = n * x(n-1)
- Chain Rule: [f(g(x)]' = f'(g(x)) * g'(x)
- Exponential: f'(ex) = ex Arbitrary base: f'(bx) = bx * lnb
- Logarithmic: f'(ln|x|) = 1/x Arbitrary base: f'b(logx) = 1/(x lnb)
- R can symbolically find the derivative of any function by using the function D() with function expression(). R knows how to use the chain rule as well.
- First derivative: D(expression(x^2), "x") ===> 2 * x
- Higher derivative: D(D(expression(x^2),"x"), "x") ===> 2
- Partial derivative: D(expression((y-x)/y),"x") ===> -(1/y) and D(expression((y-x)/y),"y") ===> 1/y - (y - x)/y^2, which is equal to x/y^2
- with the eval() function, you can get the value using particular values of its parameters: x =10; eval(D(expression(x^2), "x")) ===> 20
- D(expression(pnorm(x)),"x") ===> dnorm(x)
- D(expression(dnorm(x)),"x") ===> -(x * dnorm(x))
- R can numerically perform one dimentsional integration using function integrate()
- integrate(dnorm,-Inf,Inf) ===> 1 with absolute error < 9.4e-05
- integrate(dnorm,-2.58,2.58) ===> 0.99012 with absolute error < 1.9e-08
- integrate(function(x) {x^3 + x}, 0, 1) ===> 0.75 with absolute error < 8.3e-15
- Other differentiation related R packages
- Deriv is for symbolic differentiation.
- Ryacas allows R users to access the yacas computer algebra system that does an excellent job of differentiation.
- Use R to Compute Numerical Integrals
- Derivative Calculator, Integral Calculator
- Symbolab: Partial Derivative Calculator
- WolframAlpha: Derivative Calculator
- f(x) =∑(f(n)(a)/n! * (x - a)n
- If a = 0, the expansion is known as a Maclaurin series.
- Mathematical Annotation to write math symbols and expressions in R graphics (cheat sheet).
Wednesday, February 10, 2016
accept-reject algorithm
Accept-reject algorithm
Accept-reject algorithm (acceptance-rejection method) or reject sampling is a simple and general simulation method to decide observations with or without a trait from the probability of a distribution. In this way, we can convert a probability into a dichotomous condition (i.e. yes or no). Basically, there are three steps:
Pr(accept|X) = f(x)/cg(x)
Pr(X) = g(x)
Pr(accept) = 1/c
therefore, Pr(X|accept) = f(x)
Example: Stata simulation and define the event
Accept-reject algorithm (acceptance-rejection method) or reject sampling is a simple and general simulation method to decide observations with or without a trait from the probability of a distribution. In this way, we can convert a probability into a dichotomous condition (i.e. yes or no). Basically, there are three steps:
- Step 1. Generate Y from density g [Y = f(x), the pdf of f(x) is the target distribution]
- Sample a point (an x-position) from the proposal density distribution (g) and draw a vertical line at this point, get the density (an y-position) [X ~ g(x)]. The density function of Y has a upper, a constant c, and c is >=1.
- Step 2. Generate U from the uniform distribution on the interval (0, cg(x)) [U = cg(x), the pdf of cg(x) is the proposal distribution]
- Sample uniformly along in the range of x-position (i.e. uniformly from 0 to the maximum of the probability density function) [U ~ runif(0, 1)]
- Step 3. If U <= Y, then set Y = X ("accept"), else repeat Steps 1 and 2
Pr(accept|X) = f(x)/cg(x)
Pr(X) = g(x)
Pr(accept) = 1/c
therefore, Pr(X|accept) = f(x)
Example: Stata simulation and define the event
clear
set seed 770488
set obs 1000
gen x = runiform() - .5
gen z = runiform() - .5
gen xb = x + 8*z
gen y = 1 / (1 + exp(xb)) < runiform() // y defined as 0 or 1
logistic y x z
set seed 770488
set obs 1000
gen x = runiform() - .5
gen z = runiform() - .5
gen xb = x + 8*z
gen y = 1 / (1 + exp(xb)) < runiform() // y defined as 0 or 1
logistic y x z
Monday, December 21, 2015
ggplot2 2.0.0
ggplot2 2.0.0
I have used the ggplot2 package for a while and really like this package. It's happy to see that Hadley Wickham has officially updated the ggplot2 to version 2.0.0. On the RStudio Blog, Hadley highlights several important changes:
You can find the document/manua on the project website. Many times, I go to the dev website to find the latest document/vignettes (extension, aesthetic specifications, themes).
The R Graphics Cookbook by Winston Chang is a must-have book to learn and become an expert of ggplot2 user. You can find the codes here from the Cookbook-R, and the Google book here.
I have used the ggplot2 package for a while and really like this package. It's happy to see that Hadley Wickham has officially updated the ggplot2 to version 2.0.0. On the RStudio Blog, Hadley highlights several important changes:
- ggplot2 now has an official extension mechanism.
- There are a handful of new geoms, and updates to existing geoms.
- The default appearance has been thoroughly tweaked so most plots should look better.
- Facets have a much richer set of labelling options.
- The documentation has been overhauled to be more helpful, and require less integration across multiple pages.
- A number of older and less used features have been deprecated.
You can find the document/manua on the project website. Many times, I go to the dev website to find the latest document/vignettes (extension, aesthetic specifications, themes).
The R Graphics Cookbook by Winston Chang is a must-have book to learn and become an expert of ggplot2 user. You can find the codes here from the Cookbook-R, and the Google book here.
Tuesday, December 15, 2015
general linear models vs. generalized linear models
General linear models vs. generalized linear models
|
|
|
|
|
Typical estimation method
|
|
|
|
Special cases
|
ANOVA, ANCOVA, MANOVA, MANCOVA,
ordinary linear regression, mixed
model, t-test,
F-test
|
linear regression, logistic regression, Poisson regression, gamma regression
|
|
Function in R
|
|
|
|
Function in Matlab
|
mvregress()
|
glmfit()
|
|
Procedure in SAS
|
|
PROC GENMOD (PROC LOGISTIC for logistic regression only), PROC GLIMMIX
Comparing the MIXED and GLIMMIX |
|
Command in Stata
|
|
|
|
Function in Mathematica
|
LinearModelFit
|
GeneralizedLinearModelFit
|
|
Command in EViews
|
ls
|
- Generalized linear models have the flexiblility for response variables that have other than a normal distribution. If a generalized linear model uses an identity link function and a normal family distribution, then this model is equivalent to a general linear model.
- Generalized linear mixed models have the flexibility to model random effects and correlated errors for nonmormal data.
non-probability sample
Non-Probability Sample
Definition
Definition
- Sage Publication (pdf)
- Wikipedia
- Laerd
- Wretman (2010) Reflections on Probability vs Nonprobability Sampling (pdf)
- Doherty (1994) Probability versus Non-Probability Sampling in Sample Surveys (pdf)
- Washington Statistical Society Mini-Conference Presentations (9/9/2015, Vimeo)
Friday, November 20, 2015
All-cause mortality was increasing among US middle age whites
All-cause mortality was increasing among US middle age Whites
Title: Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century
Authors: Anne Case and Angus Deaton
Abstract: This paper documents a marked increase in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change reversed decades of progress in mortality and was unique to the United States; no other rich country saw a similar turnaround. The midlife mortality reversal was confined to white non-Hispanics; black non-Hispanics and Hispanics at midlife, and those aged 65 and above in every racial and ethnic group, continued to see mortality rates fall. This increase for whites was largely accounted for by increasing death rates from drug and alcohol poisonings, suicide, and chronic liver diseases and cirrhosis. Although all education groups saw increases in mortality from suicide and poisonings, and an overall increase in external cause mortality, thosewith less education saw the most marked increases. Rising midlife mortality rates of white non-Hispanics were paralleled by increases in midlife morbidity. Self-reported declines in health, mental health, and ability to conduct activities of daily living, and increases in chronic pain and inability to work, as well as clinically measured deteriorations in liver function, all point to growing distress in this population. We comment on potential economic causes and consequences of this deterioration. Full text: PNAS
Related articles:
Title: Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century
Authors: Anne Case and Angus Deaton
Abstract: This paper documents a marked increase in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change reversed decades of progress in mortality and was unique to the United States; no other rich country saw a similar turnaround. The midlife mortality reversal was confined to white non-Hispanics; black non-Hispanics and Hispanics at midlife, and those aged 65 and above in every racial and ethnic group, continued to see mortality rates fall. This increase for whites was largely accounted for by increasing death rates from drug and alcohol poisonings, suicide, and chronic liver diseases and cirrhosis. Although all education groups saw increases in mortality from suicide and poisonings, and an overall increase in external cause mortality, thosewith less education saw the most marked increases. Rising midlife mortality rates of white non-Hispanics were paralleled by increases in midlife morbidity. Self-reported declines in health, mental health, and ability to conduct activities of daily living, and increases in chronic pain and inability to work, as well as clinically measured deteriorations in liver function, all point to growing distress in this population. We comment on potential economic causes and consequences of this deterioration. Full text: PNAS
Related articles:
- Epimonitor: All Cause mortality rate takes surprising upward turn for middle age Whites in the US
- The Washington Post: Prestigious medical journals rejected stunning study on deaths among middle-aged whites
- The New York Times: Death rates rising for middle-aged White Americans, study finds
- Vox: A big study found spiking death rates for middle-aged white Americans. Critics say it's more complicated
- New York: Anne Case, Co-author of the White Mortality Paper, Responds to a New Critique of Its Approach to Gender (Updated)
Saturday, October 10, 2015
How to recover a lost partition of a hard drive
How to recover a lost partition of a hard drive
There are two major reasons you might not see the drive letter of your computer: the logic drive letter lose or partition table corrupted.
Try these steps first:
There are two major reasons you might not see the drive letter of your computer: the logic drive letter lose or partition table corrupted.
Try these steps first:
- Go to the 'cmd' window by holding the "Windows" key and press the "R" key
- Type and run 'diskmgmt.msc'
- "Disk Management" will be shown.
- If see a partition without a drive letter then right-click on it
- Select "Change Drive Letter and Paths..."
- Click on "Add" button
- Select the drive letter and Click on OK.
- Download the TestDisk
- Unzip and save it on the USB drive
- Run "testdisk_win"
- At the first window, select “No Log” and press the
key - Select which drive to analyse, choose “Proceed” and press
key - Select partition type (select "Intel" if it’s a PC) then press
key - Select “Analyse” then press
key - Select “Quick Search” at the next screen, then press
key - Press
key, if the partitions were created under Vista – press key if not. - TestDisk should say “Structure OK”. If so, press
key - Select “Write” and press
key and press key to confirm. - "ok" to reboot the compute, press
key - Now, close TestDisk and RESTART the computer.
Sunday, June 28, 2015
R documentation and Learning Resources
R! documentation and Learning Resources
- Blog: R commands and keyboard shortcuts
- Blog: R! Books
- Rdocumentation of Datacamp is a tool that helps you easily find and browse the documentation of all current and some past packages on CRAN.
- CRAN Task View organizes the packages into different groups such as Graphics, Survival Analysis, etc.
- R_note -- The Exploration of Statistical Software R
- R! Tips
- Resources of learning R, UCLA
- Quick-R is to help you quickly access this language in your work
- William N Venables, David M Smith, and the R Core Team. An Introduction to R (on-going updated)
- Daniel Navarro (2015). Learning Statistics with R
- Sharon Machlis (2013).Beginner's guide to R: Introduction (pdf), and (2016) Advanced Beginner's Guide to R (pdf). 60+ R resources to improve your data skills
- Emmanuel Paradis. R for Beginners
- Paul Johnson (2014). R Tips
- ZevRoss. Beautiful plotting in R: A ggplot2 Cheatsheet (pdf)
- Alboukadel Kassambara. ggplot2: Guide to Create Beautiful Graphics in R
- Jay Kerns (2010). Introduction to Probability and Statistics Using R
- Julian Faraway (2002) Practical Regression and Anova using R
- Benjamin Yakir (2011). Introduction to Statistical Thinking (With R, Without Calculus)
- Paul Helwson (2009). Multivariate Statistics with R
- Hadley Wickham. ggplot2 documents (Plotly ggplot2 docs with Plotly's open-source ggplotly converter)
- RStudio. Cheat Sheets, Webinars and Video
- Norman Matloff (2009). The Art of R Programming
- Gillespie (2016). Efficient R Programming
- Scherer: A ggplot2 tutorial for beautiful plotting in R
- Thomas Lumley (2011). Course Notes - Complex sampling and R use survey package. withReplicates {survey} computes variances by replicate weighting.
- Anthony Damico: analyze survey data for free, unlocked public-use data sets, twotorials
- Aedin Culhane: Introduction to R, CDC, June 25-28th 2012 (it's one of my favorite training courses attended). Here is a Amazon list of R TextBooks by Aedin.
- Yiling Cheng (2016). Doing Basic Calculus using R! etc.
- Verzani (2002). simpleR - Using R for Introductory Statistics (pdf)
- Patrick Burns (1998). S Poetry
- Zhou (2010). Fun with the R Grid Package
- Torre-Reyna: R: A language and Environment for statistical computing (Full manual)
- Getting started in R~Stata notes on exploring data (pdf)
- Muenchen: R for SAS and SPSS Users (pdf). the list of SAS equivalent packages.
- Maindonald: Using R for Data Analysis and Graphics (pdf)
- Data Analysts Captivated by R’s Power from the New York Times.
- Kadane. Principles of Uncertainty (pdf)
- Kerns. Introduction to Probability and Statistics Using R (pdf)
- Gaston. Handling and Processing Strings in R
- Frank Harrell. Regression Modeling Strategies and the rms Package
- Effective Graphs with Microsoft R Open
- Roger Peng and Jeff Leek. Git and GitHub videos for beginners
- Roger Peng etc. Course materials of Data Science Specialization
- Kevin Markham (2014) Hands-on dplyr tutorial for faster data manipulation in R
- Hopper (2016).The Excel User's introduction to R!
- Rickert. Getting Started with Markov Chains
- Wikibook. R programming
- Walker (2014). International population pyramids with ggplot2
- Demos: A Language, not a Letter: Learning Statistics in R
Monday, June 01, 2015
Running and longevity
Running and Longevity
News
News
- DialyMail: Stop that binge jogging
- Long term study found slow joggers had the lowest rates of death
- Strenuous joggers were as likely to die as sedentary non-joggers
- Going jogging three times a week for no more than 2.4 hours is optimal
- Pace of the slow joggers corresponds to vigorous exercise and strenuous jogging corresponds to very vigorous exercise, researchers qualified
- ScienceDaily: Light jogging may be most optimal for longevity
- CDC: How much physical activity do you need?
- Schnohr (2015). Dose of Jogging and Long-Term Mortality: the Copenhagen City Heart Study.
- Mittlema (1993). Triggering of Acute Myocardial Infarction by Heavy Physical Exertion
- Cheng (2000): Physical activity and self-reported, physician-diagnosed osteoarthritis: is physical activity a risk factor?
- Cheng (2012): Prevalence of diagnosed arthritis and arthritis-attributable activity limitation among adults with and without diagnosed diabetes
Subscribe to:
Posts (Atom)