- Cheatsheets, The R Guide, R Reference Card
- Help functions: help()/?, apropos(), find(): apropos() finds all objects. find() the locations of found objects, methods(), example(), demo(), vignette(), args()
- Housekeeping functions: getwd(), setwd(), rm(list=ls()) removes all objects in the R environment, source("myRscript.r") runs the R codes in "myRscript.r" file, fix() modifies the original object, and edit() is used edit an object and returns to a new object, download.file() downloads a file from the Internet, attach()/detach() objects, search() shows the current search paths and sequence, install.packages(), update.packages(), remove.packages(), getOption("defaultPackages") which can be changed by setting the option in startup code (e.g. in ~/.Rprofile), .libPaths()
- Numeric/character functions: length(), seq(), rep(), cut(), pretty(), cat(), substr(), grep(), sub(), strsplit(), paste(), toupper(), tolower()
- Data functions: read.table(), head(), tail(), str(), class(), length(), dim(), nrow(), ncol(), names(), levels(), length(), c(), cbind(), rbind(), append(), rep(), rev(), sort(), unique()
- Type functions: "is." for checking or "as." for conversion + numeric(), character(), vector(), matrix(), data.frame(), factor(), logical(), integer(). For example: is.numeric(), as.numeric()
- Mathematical functions: abs(), sqrt(), log(), log(x, base=n), log10(), exp(), prod(), factorial(), choose(), ceiling(), floor(), solve(), trunc(), round(), signif(), cos(), sin(), tan(), acos()
- Statistical functions: mean(), median(), sd(), var(), mad(), quantile(), range(), sum(), diff(), min(), max(), scale(), fivenum(), cumsum(), cumprod(), cummax(), cumin(), cor(), colSums(), rowSums(), colMeans(), rowMeans()
- Probability functions: the form is [d][p][q][r]distribution(). d, p, q, r are for (d)ensity, cumulated (p)robability/distribution function, (q)uantile function, and (r)andom generation, respectively. the Distribution types can be: (norm)al, (beta), (binom)ial, (chisq)uared, (exp)onential, (logis)tic, (multinom)ial, (n)egative (binom)ial, (pois)son, (f), (gamma), (t), (unif)orm, etc. for example: dnorm(), pnorm(), qnorm(), rnorm()
- Statistical modeling functions
- Model functions: lm(), glm(), nls(), nls2(), lme() / nlme()
- Symbol formulas (y ~ A + B + C ): ":" is for interaction term, "*" is for complete interaction, "^" is for crossing to a specified degree "." is a placeholder for all other variables except the dependent variable, "-" removes a variable from the equation, "-1" suppresses the intercept, "I()" has elements within the parentheses interpreted arithmetically
- Post-estimation functions: coef(), confint(), resid(), fitted(), summary(), predict(), deviance(), print(),plot(), formula(), anova(obj1, obj2), AIC(), vcov()
- Contrast functions: contr.helmert(), contr.poly(), contr.sum(), contr.treatment(), contr.SAS()
- RStudio is an integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R. Shortcuts (you can modify them: Tools -> Modify Keyboard Shortcuts...)
- Alt + Shift + K: Show a Quick Reference
- Alt + -: Insert assignment operator "<- font="">->
- Ctrl + Shift + M: Insert pipe operator "%>%" (I changed it as Ctrl + Shift + P)
- Ctrl + Alt + I: Insert chunk (R Notebook/Markdown)
- Ctrl + 1: Move cursor to source Editor window
- Ctrl + 2: Move cursor to Command window
- Ctrl + 3: Move cursor to Help window
- Ctrl + 4: Move cursor to History window
- Ctrl + 5: Move cursor to File window
- Ctrl + 6: Move cursor to Plots window
- ...
Disclaimer: This blog site is intended solely for sharing of information. Comments are warmly welcome, but I make no warranties regarding the quality, content, completeness, suitability, adequacy, sequence, or accuracy of the information.
Friday, March 24, 2017
R functions and keyboard shortcuts
R functions/commands and keyboard shortcuts
Monday, March 13, 2017
choice of analytical language
Choice of analytical language
I have used mainly three statistical languages, Stata, R, and SAS, for many years for different purposes. The weights of usage of those three languages are shift from SAS-Stata-R to SAS-R-Stata, then, to Stata-R-SAS. Sometimes I am asked to recommend a better analytic language, which is always a hard and complicated question to me. I came across an blog written by Curtis Miller, which is very thoughtful and helpful to make this kind of choice. Here is his blog: "On Programming Languages; Why My Dad Went From Programming to Driving a Bus". Hopefully his story can help you to make your own decision.
I have used mainly three statistical languages, Stata, R, and SAS, for many years for different purposes. The weights of usage of those three languages are shift from SAS-Stata-R to SAS-R-Stata, then, to Stata-R-SAS. Sometimes I am asked to recommend a better analytic language, which is always a hard and complicated question to me. I came across an blog written by Curtis Miller, which is very thoughtful and helpful to make this kind of choice. Here is his blog: "On Programming Languages; Why My Dad Went From Programming to Driving a Bus". Hopefully his story can help you to make your own decision.
Wednesday, March 08, 2017
Stata News: in the spotlight
Stata News: in the spotlight etc.
- 2021
- Stata's growing interoperability: The case of PyStata & Jupyter Notebook
- Naqvi: Stata and GitHub Integration
- Customizable tables in Stata 17
- Creating dynamic HTML documents with Stata output
- 2020
- Enhancements to survival analysis suite
- Using Python within Stata
- Community corner: Graph Workflow
- Using margins to interpret choice model results
- Bayesian inference using multiple Markov chains
- 2019
- Customized forest plots for displaying meta-analysis results
- Importing data from SPSS and SAS
- Fun with frames
- Lasso
- Interpreting models for log-transformed outcomes (unbiased prediction: E(Y|X) = eXBeσ2/2 )
- User's corner: ftools and gtools
- 2018
- Scheming your way to your favorite graph style
- User's corner: Machine learning
- Nonparametric regression—Estimation, inference, and effects
- User's corner: A little help with Mata from the SSCC
- Dynamic stochastic general equilibrium models for policy analysis
- User's corner: ietoolkit for everyday tasks
- Interval-censored survival data—model fitting and beyond
- User's corner: Network analysis made easy
- 2017
- In the spotlight: Nonlinear multilevel mixed-effects models
- Cheatsheet: User's Corner: Quick references for your favorite commands
- What's new in Stata 15 (released on 2017-06-06, 15.1 released on 2017-12-20)
- Visualizing continuous-by-continuous interactions with margins and twoway contour
- 2016
- Storing long strings and entire files in Stata datasets
- Estimating, graphing, and interpreting interactions using margins
- eteffects and the challenges of making causal inferences
- Bayesian IRT–4PL model
- 2015
- Easy-to-interpret, flexible survival-time treatment effects, and Postestimation Selector
- Treatment effects, and irt
- Bayesian “random-effects” models, and What's New in Stata 14
- Finding and using results, constants, functions ... anything (Data > Other utilities > Hand calculator), and forecast for dynamic panel data and counterfactuals
- 2014
- 2013
- New univariate time-series features added in 13.1, and Adding your own methods to analyze power and sample size
- mlexp, and meglm, What's New in Stata 13
- 2012
- marginsplot and Fractals
- mgarch, and Receiver operating characteristic curves
- import excel and export excel
- 2011
- state-space models: Easier than they look
- SEM for economists (and others who think they don’t care), What's New in Stata 12
- The data editor
- 2010
- Competing-risks regression
- Margins of predicted outcomes
- Factor variables, and What's New in Stata 11.1
- Multiple imputation
- 2009: What's New in Stata 11.0
- 2008: Stata 10.1 Update
- 2007: What's New in Stata 10
- 2005: What's New in Stata 9
- 2002: What's New in Stata 8
- 2000: What's New in Stata 7
- 1985-1999: History of Stata
Friday, March 03, 2017
Syndemics: health in context
Syndemics: health in context
A syndemic, coined by Merrill Singer in mid-1990s, is a conceptual framework for understanding diseases or health conditions that arise in populations and that are exacerbated by the social, economic, environmental, and political milieu in which a population is immersed. The today's issue of Lancet published a series related the syndemic... full text ...
A syndemic, coined by Merrill Singer in mid-1990s, is a conceptual framework for understanding diseases or health conditions that arise in populations and that are exacerbated by the social, economic, environmental, and political milieu in which a population is immersed. The today's issue of Lancet published a series related the syndemic... full text ...
Tuesday, January 24, 2017
Information about the Global Burden of Diseases, Injuries, and Risk Factors Study
Information about the Global Burden of Diseases, Injuries, and Risk Factors Study
- WHO: the Global Burden of Disease (GBD) project
- Lancet Global Burden of Disease
- UW website directed Christopher Murray
- Wikipedia: History of GDB
- List of causes and ICD9 and ICD10 of these causes in eTable 2 of supplement
- Search title with "Global Burden of Disease Study" on the PubMed
Tuesday, January 03, 2017
Cheng YJ, Gregg EW, Rolka DB, Thompson TJ.
BACKGROUND:
METHODS:
RESULTS:
CONCLUSION:
Wednesday, November 30, 2016
Use French cleat to hold things
Use French cleat to hold things
- Wikipedia. What is the French cleat?
- Popular Mechanics. How to Build a French Cleat Shelf to Hold Virtually Anything
- Family Handyman. Custom Garage Storage (Video)
Tuesday, November 29, 2016
Interview with J.J. Allaire
Interview with J.J. Allaire - the founder of RStudio
by Joseph Rickert
Welcome to “R Views”, the new R Community blog from RStudio. For this first post, I sat down with J.J. Allaire, RStudio’s founder and CEO, to discuss RStudio’s history, its mission and JJ’s vision for its future. In a short time, we touched on a wide range of subjects including RStudio’s business, the growth of the R language, the importance of the R Consortium to the R Community and J.J.’s advice to anyone coming to R for the first time. We hope you enjoy this “snapshot” of RStudio’s place in the R world. full text
You can also read a Chinese version here.
by Joseph Rickert
Welcome to “R Views”, the new R Community blog from RStudio. For this first post, I sat down with J.J. Allaire, RStudio’s founder and CEO, to discuss RStudio’s history, its mission and JJ’s vision for its future. In a short time, we touched on a wide range of subjects including RStudio’s business, the growth of the R language, the importance of the R Consortium to the R Community and J.J.’s advice to anyone coming to R for the first time. We hope you enjoy this “snapshot” of RStudio’s place in the R world. full text
You can also read a Chinese version here.
Thursday, October 13, 2016
'Big Fat Fix' Film Challenges Mediterranean Diet
'Big Fat Fix' Film Challenges Mediterranean Diet
An Interview With Cardiologist Aseem Malhotra
Editor's Note: Cardiologist Aseem Malhotra, MBChB, MRCP, talks about his new documentary The Big Fat Fix, which sent him to Pioppi, Italy, the village where Ancel Keys researched diet and cardiovascular health. A regular contributor to the BMJ and major UK newspapers on the topic of dietary health, Dr Malhotra believes that the demonization of fat let sugar off the hook as the real culprit in the diabetes, obesity, and cardiovascular disease epidemic, and that we need to rethink our approach to exercise. ... Full Text.
This article is an another interesting opinion based on facts and viewed from a different angle. This interview reminds me the Michael Pollan's book In Defense of Food published in 2008: Food – Not Nutrients – Is The Fundamental Unit In Nutrition. (PBS Documentary In Defense of Food in Dec. 2015, PBS Newshour and on YouTube).
Food Insight (2015). 4 Food Rules You Won’t Find in Michael Pollan’s ‘In Defense of Food’
An Interview With Cardiologist Aseem Malhotra
Editor's Note: Cardiologist Aseem Malhotra, MBChB, MRCP, talks about his new documentary The Big Fat Fix, which sent him to Pioppi, Italy, the village where Ancel Keys researched diet and cardiovascular health. A regular contributor to the BMJ and major UK newspapers on the topic of dietary health, Dr Malhotra believes that the demonization of fat let sugar off the hook as the real culprit in the diabetes, obesity, and cardiovascular disease epidemic, and that we need to rethink our approach to exercise. ... Full Text.
This article is an another interesting opinion based on facts and viewed from a different angle. This interview reminds me the Michael Pollan's book In Defense of Food published in 2008: Food – Not Nutrients – Is The Fundamental Unit In Nutrition. (PBS Documentary In Defense of Food in Dec. 2015, PBS Newshour and on YouTube).
Food Insight (2015). 4 Food Rules You Won’t Find in Michael Pollan’s ‘In Defense of Food’
Wednesday, October 12, 2016
Microbiome: Fibre for the future
Microbiome: Fibre for the future
Nautre: Eric Martens
A chronic lack of dietary fibre has been found to reduce the diversity of bacteria in the guts of mice. This effect is not fully reversed when fibre is reintroduced, and increases in severity over multiple generations. ... Full text
Nautre: Eric Martens
A chronic lack of dietary fibre has been found to reduce the diversity of bacteria in the guts of mice. This effect is not fully reversed when fibre is reintroduced, and increases in severity over multiple generations. ... Full text
Battle of the data science Venn Diagrams
Battle of the Data Science Venn Diagrams
by David Taylor
As a field in search of a definition, it's unsurprising that you can find a lot of different attempts to define it.
As a field full of data nerds with a penchant for visualization, it's also unsurprising that a lot of them use Venn diagrams. (Fun fact: John Venn, who invented the eponymous diagrams, and his son filed a patent in 1909 for an lawn bowling machine.)... Full Text
by David Taylor
Data science is a rather fuzzily defined field; some of the definitions I've heard are:
- "Work that takes more programming skills than most statisticians have, and more statistics skills than a programmer has."
- "Applied statistics, but in San Francisco."
- "The field of people who decide to print 'Data Scientist' on their business cards and get a salary bump."
As a field in search of a definition, it's unsurprising that you can find a lot of different attempts to define it.
As a field full of data nerds with a penchant for visualization, it's also unsurprising that a lot of them use Venn diagrams. (Fun fact: John Venn, who invented the eponymous diagrams, and his son filed a patent in 1909 for an lawn bowling machine.)... Full Text
Saturday, October 01, 2016
Stata: Get out-of-sample file predictions
Stata: Get out-of-sample file predictions
Example:
webuse auto,clear
Example:
webuse auto,clear
regress mpg weight foreign
est store regxb
preserve
webuse newautos,clear
est restore regxb
predict mpg
list
restore
Thursday, September 01, 2016
Stata: display system date
Stata: display system date
- .di "system date:" c(current_date)
- .di "system date:" "$S_DATE"
- .di %td_CY-N-D date("`c(current_date)'","DMY") // "` '" are not necessary
- .di %td_CY-N-D date("$S_DATE","DMY")
- .di "system year: " year(date(c(current_date),"DMY") // w/o `' around c(current_date)
- .di "system month: " month(date(c(current_date),"DMY"))
- .di "system day:" day(date(c(current_date),"DMY"))
- .di "system year: " year(date("$S_DATE","DMY"))
- .di "system month: " month(date("$S_DATE","DMY"))
- .di "system day:" day(date("$S_DATE","DMY"))
- more examples:
- Works ('local' with '=')
- local dd=day(date(c(current_date),"DMY"))
- local mm=month(date(c(current_date),"DMY"))
- local yy=year(date(c(current_date),"DMY"))
- log using "output_`yy'_`mm'_`dd'.log", replace
- log close
- Doesn't work ('local' without '=')
- local dd day(date(c(current_date),"DMY"))
- local mm=month(date(c(current_date),"DMY"))
- local yy=year(date(c(current_date),"DMY"))
- log using "output_`yy'_`mm'_`dd'.log", replace
- log close // invalid 'DMY' r(198)
- Works ('global' with '=')
- global dd=day(date(c(current_date),"DMY"))
- global mm=month(date(c(current_date),"DMY"))
- global yy=year(date(c(current_date),"DMY"))
- log using "output $yy-$mm-$dd.log", replace
- log close
Sunday, July 31, 2016
Recycle/reuse returned results in Stata
Recycle/reuse returned results in Stata
matrix BB = e(b)
matrix colnames BB = "1.race" "2.race" "3.race"
ereturn repost b = BB, rename
matrix VV = e(V)
matrix colnames VV = "1.race" "2.race" "3.race"
matrix rownames VV = "1.race" "2.race" "3.race"
ereturn repost V = VV
end
- UCLA: "How can I access information stored after I run a command in Stata (returned results)?"
- The Stata Blog: Drukker (2015). Programming an estimation command in Stata: Where to store your stuff
- Stackoverflow (2014).Saving coefficients and standard errors as variables
- Lembcke (2009). Advanced Stata Topics
- SSCC. An Introduction to Mata
- Stata commands are grouped into 4 major categories: r-class, e-class, s-class, and n-class commands. Also a c-class contains the values of system parameters and settings, along with certain constants.
- The commands produce the statistical results are either r-class or e-class. e-class commands produce the estimation results, others are belong to r-class.
- After submitting "contrast", Stata generates a L matrix (r(L)), you can check the contrast coefficients using "matrix list r(L)".
- If don't know what results are outputted, use "return list" or "ereturn list" to find them. The scalar results from a r-class can be used with the "r(...)" and scalar results from e-class command can be used with "e(...)". Here, "..." is the name showed using "return list" or "ereturn list". The use of results in matrix form is a little tricky. "_b[...]" or "_se[...]" have to be used; here, "..." is the variable name of a coefficient in the model. The results for a constant is used as "_b[_cons]" for beta coefficient or "_se[_cons]" for standard error. A matric results can also converted into a matrix: "mat B=e(b)", then "disp B[rowno, colon]".
- To show variance-covariance matrix, use: "estat vce" or just simple "matlist e(V)", and to show correlation, use: "estat vce, correlation".
- You can "estimate store" and "estimate restore" a set of estimates with a name in memory, in such way, the following command will not be erased. If want to save and use it as a permanent file, you can use "estimate save" and "estimate use".
- A single number can been converted into scalar, for example, "scalar xyz=_b[agecat]". However, the scalar has to be used with a pseudofunction scalar(), for example, "display scalar(xyz)" (more info)
- The e(V) and e(b) matrices can be converted into variables of a dataset using "svmat" (convert variables into matrix using "mkmat"), which is similar to "putmat and getmat" of mata (matrix ref.):
- mat D = e(b)', e(b)'
- svmat double D, name(coef)
- mat se1=vecdiag(e(V))
- mat se2=vecdiag(e(V))
- mat SE = se1, se2
- svmat SE, name(se)
- The "ereturn display" can use the e(V) and e(b) matrices to return a r-class matrix "r(table)"
- "margins" also gives e-class results:
- webuse dollhill3,clear
- poisson deaths i.smokes##c.agecat, exposure(pyears)
- est store tempreg
- margins smokes, gen(dhat) predict(ir) // undocumented gen()
- mean dhat1 // for smokes = 0
- scalar dhat1=_b[dhat1] // .00810452
- margins smokes, eydx(agecat) predict(ir) post
- scalar eydxsmokes0=_b[0.smokes] // 1.046826
- est restore tempreg
- margins smokes, dydx(agecat) predict(ir) post
- scalar dydxsmokes0=_b[0.smokes] // .00848402
- disp scalar(dydxsmokes0)/scalar(dhat1) // gives 1.046826
- Gould(2010).Mata Matters; (2011).Mata, the missing manual. Baum(2009).Using Mata to work more effectively in Stata
- putmat and getmat - Put Stata variables into Mata and vice versa
- mata r2=(1\2\3)
- mata b=st_matrix("e(b)")'
- mata se=sqrt(diagonal(st_matrix("e(V)")))
- getmata r2 b se, force
- vwls b r2, sd(se)
- reg b r2
- Rename "rowname" and "colname" of a matrix
matrix BB = e(b)
matrix colnames BB = "1.race" "2.race" "3.race"
ereturn repost b = BB, rename
matrix VV = e(V)
matrix colnames VV = "1.race" "2.race" "3.race"
matrix rownames VV = "1.race" "2.race" "3.race"
ereturn repost V = VV
end
- total heartatk [pw=swgt], over(race)
- estmatrename
- lincom (_b[3.race]-_b[1.race])/2
- test _b[1.race]=_b[2.race]
- contrast {race 1 -1 0}
- contrast p(1).race
- Convert ln(RR) into RR and percent change
- webuse dollhill3
- poisson deaths smokes i.agecat,exposure(pyears) irr margins agecat, predict(ir) post
- qui nlcom (lnRR21:ln((_b[2.agecat]/_b[1.agecat])))(lnRR31:ln((_b[3.agecat]/_b[1.agecat]))) (lnRR41:ln((_b[4.agecat]/_b[1.agecat]))), post
- ereturn disp,eform(RR) cformat(%5.2f) pformat(%5.4f)
- mat rtable=r(table)'
- mat RR=rtable[1...,"b"],rtable[1...,"ll".."ul"]
- mata st_matrix("pctable",(st_matrix("RR"):-1):*100)
- mat coln pctable=RR LL UL
- matlist pctable, format(%10.2f)r
Monday, May 23, 2016
The 21 greatest graduation speeches of the last 60 years
Vox: The 21 greatest graduation speeches of the last 60 years
by German Lopez on May 11, 2016
"Graduation speeches are the last opportunity for a high school or college to educate its students. It's unsurprising, then, that these institutions often pull in some of the world's most powerful people to leave an equally powerful impression on their students. Here are the best of those speeches and some of the sections that resonate the most..." (May 11, 2016)
To read and watch the full article on the Vox website here.
by German Lopez on May 11, 2016
"Graduation speeches are the last opportunity for a high school or college to educate its students. It's unsurprising, then, that these institutions often pull in some of the world's most powerful people to leave an equally powerful impression on their students. Here are the best of those speeches and some of the sections that resonate the most..." (May 11, 2016)
To read and watch the full article on the Vox website here.
Sunday, May 01, 2016
R! Books
R! Books
- Oscar: Big Book of R collection
- RStudio: Cheatsheets
- W.N. Venables. An Introduction to R
- Hadley Wickham & Garrett Grolemund (2017). R for Data Science
- Hadley Wickham. Advanced R
- Winston Chang: R Graphics Cookbook, 2nd
- Christoph Hank: Introduction to econometrics with R
- Neale Batra: R for applied epidemiology and public health
- David Dalpiaz: Applied Statistics with R (HTML version) GitHub
- Colin Gillespie. Efficient R programming
- Hadley Wickham (2015). R Packages
- Yihui Xie. bookdown: Authoring Books and Technical Documents with R Markdown
- Yihui Xie. R Markdown: The Definitive Guide
- Julia Silge: Text Mining with R
- Patrick Burns (2011). The R Inferno
- Daniel Navarro. Learning Statistics with R
- Trevor Hastie, Robert Tibshirani, Gareth James, Daniela Witten: An Introduction to Statistical Learning, with Applications in R 2nd. (pdf) with the excellent self-paced video training course. (here is the 15-hours of video training video abstracted by the Data School (YouTube))
- Trevor Hastie (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Norman Matloff. The Art of R Programming (part)
- Michael Crawley (2007). The R Book
- Bolker (2007).Ecological models and data in R (2007 draft). Appendix (w/ delta method)
- Winston Chang. Cookbook for R
- Verzani. simpleR - Using R for Introductory Statistics
- Kerns. Introduction to Probability and Statistics Using R
- Peng. R Programming for Data Science, The Art of Data Science, Exploratory Data Analysis with R
- Yakir. Introduction to Statistical Thinking (With R, Without Calculus)
- Aragón. Population Health Data Science with R
- 赵鹏, 谢益辉, 黄湘云 现代统计图形 (Modern Statistical Graphics)
Thursday, March 31, 2016
how to configure R! environment
Where/how to configure R start-up environment
There are several approaches can be used to customize the R working environment such as options and library directory etc. at R start-up:
There are several approaches can be used to customize the R working environment such as options and library directory etc. at R start-up:
- Modify the R original profile file directly. The "Rprofile.site" is under the directory ".\R directory name\etc\". At both startup and end, the R will use the "Rprofile.site" file, then looks for the user-defined ".Rprofile" file in the current working directory (run "getwd()" to find the current location of working directory) or in the user's R home directory (run "R.home()" or "Sys.getenv("R_HOME")"to find where it is). You can edit the "Rprofile.site" file or create a ".Rprofile" file to customize the startup. For more information see Initialization at start of an R session and Customizing startup. I am using R-Portable and prefer to create a ".Rprofile" in the same directory of "R-Portable.exe" file. In such way, I don't need to dig deep and edit the R original setting.
- to lists all the options can be set, run "names(options())"
- to show the value of an item, run "options("option name")", for example:
- "options("digits")" shows "$digits, [1] 7", which means the number will be shown 7 digits.
- "options("defaultPackages")" shows the packages attached by default when R starts up
- to modify the values of an option item, run "options(xxx=yyy), for example:
- "options(digits=15)" changes the digit number into 15. Notes: this is for setting full length of number but not number of decimal places. To set the number of decimal, try such as "round(4/3, digits=2)" with 2 decimal places but not in "options()" unfortunately.
- to set the directory of personal R library, create a ".Rprofile" file in the working directory and include ".libPaths(c(.libPaths(),"c:/myRlib directory name")", save it.
- or, edit "Rprofile.site", add line: Add line: ".libPaths(c(.libPaths(),"c:/myRlib directory name")"
- When use RStudio as the IDE, modify the options file ("Options.R") under the ".\Rstudio directory name\R\". The option setting overwrites the option setting in R profiles both "Rprofile.site" and ".Rprofile".
- to set the directory of personal R library, edit file "Options.R", add line: ".libPaths(c(.libPaths(),"c:/myRlib directory name")", then save the "Options.R".
- or, to use ".Rprofile", this file needs be in the working directory when not in a project (to set this master working directory using RStudio GUI: tools -> Global options... -> change the "Default working directory(when not in a project):"). Also you can change R.home() under the "R version:".
- By the way, the options and the directory of package library can also be changed after the start-up of R.
- de Vries (2015).Best practices for handling packages in R projects
- Gillespie. R startup
Saturday, February 27, 2016
Doing Basic Calculus Using R!
Doing Basic Calculus Using R!
Differentiation Rules/Rules for Calculating Derivatives
Differentiation Rules/Rules for Calculating Derivatives
- Constant: f'(c) = 0, here c as a constant
- Scalar Multiple: f'[cf(x)] = cf'(x)
- Sum and Difference: [f(x) ± g(x)]' = f' (x) ± g' (x)
- Product: [f(x) * g(x)]' = f'(x) * g(x) + f(x) * g'(x)
- Quotient: [f(x) / g(x)]' = [g(x) * f'(x) - f(x) * g'(x)] / g(x)2
- Power: f'(xn) = n * x(n-1)
- Chain Rule: [f(g(x)]' = f'(g(x)) * g'(x)
- Exponential: f'(ex) = ex Arbitrary base: f'(bx) = bx * lnb
- Logarithmic: f'(ln|x|) = 1/x Arbitrary base: f'b(logx) = 1/(x lnb)
- R can symbolically find the derivative of any function by using the function D() with function expression(). R knows how to use the chain rule as well.
- First derivative: D(expression(x^2), "x") ===> 2 * x
- Higher derivative: D(D(expression(x^2),"x"), "x") ===> 2
- Partial derivative: D(expression((y-x)/y),"x") ===> -(1/y) and D(expression((y-x)/y),"y") ===> 1/y - (y - x)/y^2, which is equal to x/y^2
- with the eval() function, you can get the value using particular values of its parameters: x =10; eval(D(expression(x^2), "x")) ===> 20
- D(expression(pnorm(x)),"x") ===> dnorm(x)
- D(expression(dnorm(x)),"x") ===> -(x * dnorm(x))
- R can numerically perform one dimentsional integration using function integrate()
- integrate(dnorm,-Inf,Inf) ===> 1 with absolute error < 9.4e-05
- integrate(dnorm,-2.58,2.58) ===> 0.99012 with absolute error < 1.9e-08
- integrate(function(x) {x^3 + x}, 0, 1) ===> 0.75 with absolute error < 8.3e-15
- Other differentiation related R packages
- Deriv is for symbolic differentiation.
- Ryacas allows R users to access the yacas computer algebra system that does an excellent job of differentiation.
- Use R to Compute Numerical Integrals
- Derivative Calculator, Integral Calculator
- Symbolab: Partial Derivative Calculator
- WolframAlpha: Derivative Calculator
- f(x) =∑(f(n)(a)/n! * (x - a)n
- If a = 0, the expansion is known as a Maclaurin series.
- Mathematical Annotation to write math symbols and expressions in R graphics (cheat sheet).
Wednesday, February 10, 2016
accept-reject algorithm
Accept-reject algorithm
Accept-reject algorithm (acceptance-rejection method) or reject sampling is a simple and general simulation method to decide observations with or without a trait from the probability of a distribution. In this way, we can convert a probability into a dichotomous condition (i.e. yes or no). Basically, there are three steps:
Pr(accept|X) = f(x)/cg(x)
Pr(X) = g(x)
Pr(accept) = 1/c
therefore, Pr(X|accept) = f(x)
Example: Stata simulation and define the event
Accept-reject algorithm (acceptance-rejection method) or reject sampling is a simple and general simulation method to decide observations with or without a trait from the probability of a distribution. In this way, we can convert a probability into a dichotomous condition (i.e. yes or no). Basically, there are three steps:
- Step 1. Generate Y from density g [Y = f(x), the pdf of f(x) is the target distribution]
- Sample a point (an x-position) from the proposal density distribution (g) and draw a vertical line at this point, get the density (an y-position) [X ~ g(x)]. The density function of Y has a upper, a constant c, and c is >=1.
- Step 2. Generate U from the uniform distribution on the interval (0, cg(x)) [U = cg(x), the pdf of cg(x) is the proposal distribution]
- Sample uniformly along in the range of x-position (i.e. uniformly from 0 to the maximum of the probability density function) [U ~ runif(0, 1)]
- Step 3. If U <= Y, then set Y = X ("accept"), else repeat Steps 1 and 2
Pr(accept|X) = f(x)/cg(x)
Pr(X) = g(x)
Pr(accept) = 1/c
therefore, Pr(X|accept) = f(x)
Example: Stata simulation and define the event
clear
set seed 770488
set obs 1000
gen x = runiform() - .5
gen z = runiform() - .5
gen xb = x + 8*z
gen y = 1 / (1 + exp(xb)) < runiform() // y defined as 0 or 1
logistic y x z
set seed 770488
set obs 1000
gen x = runiform() - .5
gen z = runiform() - .5
gen xb = x + 8*z
gen y = 1 / (1 + exp(xb)) < runiform() // y defined as 0 or 1
logistic y x z
Monday, December 21, 2015
ggplot2 2.0.0
ggplot2 2.0.0
I have used the ggplot2 package for a while and really like this package. It's happy to see that Hadley Wickham has officially updated the ggplot2 to version 2.0.0. On the RStudio Blog, Hadley highlights several important changes:
You can find the document/manua on the project website. Many times, I go to the dev website to find the latest document/vignettes (extension, aesthetic specifications, themes).
The R Graphics Cookbook by Winston Chang is a must-have book to learn and become an expert of ggplot2 user. You can find the codes here from the Cookbook-R, and the Google book here.
I have used the ggplot2 package for a while and really like this package. It's happy to see that Hadley Wickham has officially updated the ggplot2 to version 2.0.0. On the RStudio Blog, Hadley highlights several important changes:
- ggplot2 now has an official extension mechanism.
- There are a handful of new geoms, and updates to existing geoms.
- The default appearance has been thoroughly tweaked so most plots should look better.
- Facets have a much richer set of labelling options.
- The documentation has been overhauled to be more helpful, and require less integration across multiple pages.
- A number of older and less used features have been deprecated.
You can find the document/manua on the project website. Many times, I go to the dev website to find the latest document/vignettes (extension, aesthetic specifications, themes).
The R Graphics Cookbook by Winston Chang is a must-have book to learn and become an expert of ggplot2 user. You can find the codes here from the Cookbook-R, and the Google book here.
Subscribe to:
Posts (Atom)