Wednesday, April 05, 2017

R! related

R! related

Blog: R Documentation and Learning Resources

IDE and GUI
  • RStudio is an IDE for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R. RStudio Webinars
  • Rcmdr is a GUI of R.
  • esquisse is a great addin for ggplot2
  • Deducer is a good but relative old GUI for exploring data like JMP with ggplot2 behind (Plot Builder). JGR is a Java GUI for R. (ggplot2 – much easier with JGR and Deducer). To use Deducer, you need - install.packages(c("JGR", "Deducer", "DeducerExtras")) -, submit: - library(JGR) -, - JGR() -; then, in the JGR console, to load Deducer, go to 'Packages & Data' > 'Package Manager' and select Deducer and DeducerExtras. more: all-in-one installing JGR and Deducer. Notes: 1) to set JAVA location using --- options(java.home="xxx/Java/") ---. 2) rJAVA ver 9.6 needs running under the 32 bit R! on my computers.
  • GrapheR (pdf) is another GUI for draw customized graphs without knowing any R commands.
  • Tessera - Open source environment for deep analysis of large complex data (Divide and recombine)
  • The application Bio7 is an integrated development environment for ecological modelling and contains powerful tools for model creation, scientific image analysis (ImageJ) and statistical analysis.
Communication between SAS and R


Graphical parameters


R! How can I include Greek letters in my plot labels?
Revolutions: How to make a heat map in R, Superheat: supercharged heatmaps for R

    Chart Chooser —  improves Excel and PowerPoint charts. there is R! version of Chart Chooser (not many charts on the site, but the idea is great)


      Packages (rdrr.ioCRANRdocumentation, Inside-R, Quick-R, Bioconductor)

      Friday, March 24, 2017

      R functions and keyboard shortcuts

      functions/commands and keyboard shortcuts
      R! is powerful and has rich packages and functions. It's impossible to build a list of functions/shortcuts to fit the purposes of all. Below are some functions/shortcuts related to my projects.
      • CheatsheetsThe R Guide, R Reference Card
      • Help functions: help()/?, apropos(), find(): apropos() finds all objects. find() the locations of found objects, methods(), example(), demo(), vignette(), args() 
      • Housekeeping functions: getwd(), setwd(), rm(list=ls()) removes all objects in the R environment, source("myRscript.r") runs the R codes in "myRscript.r" file, fix() modifies the original object, and edit() is used edit an object and returns to a new object, download.file() downloads a file from the Internet, attach()/detach() objects, search() shows the current search paths and sequence, install.packages(), update.packages(), remove.packages(), getOption("defaultPackages") which can be changed by setting the option in startup code (e.g. in ~/.Rprofile), .libPaths()
      • Numeric/character functions: length(), seq(), rep(), cut(), pretty(), cat(), substr(), grep(), sub(), strsplit(), paste(), toupper(), tolower()
      • Data functions: read.table(), head(), tail(), str(), class(), length(), dim(), nrow(), ncol(), names(), levels(), length(), c(), cbind(), rbind(), append(), rep(), rev(), sort(), unique()
      • Type functions: "is." for checking or "as." for conversion + numeric(), character(), vector(), matrix(), data.frame(), factor(), logical(), integer(). For example: is.numeric(), as.numeric()
      • Mathematical functions: abs(), sqrt(), log(), log(x, base=n), log10(), exp(), prod(), factorial(), choose(), ceiling(), floor(), solve(), trunc(), round(), signif(), cos(), sin(), tan(), acos()
      • Statistical functions: mean(), median(), sd(), var(), mad(), quantile(), range(), sum(), diff(), min(), max(), scale(), fivenum(), cumsum(), cumprod(), cummax(), cumin(), cor(), colSums(), rowSums(), colMeans(), rowMeans()
      • Probability functions: the form is [d][p][q][r]distribution(). d, p, q, r are for (d)ensity, cumulated (p)robability/distribution function, (q)uantile function, and (r)andom generation, respectively. the Distribution types can be: (norm)al, (beta), (binom)ial, (chisq)uared, (exp)onential, (logis)tic, (multinom)ial, (n)egative (binom)ial, (pois)son, (f), (gamma), (t), (unif)orm, etc. for example: dnorm(), pnorm(), qnorm(), rnorm()
      • Statistical modeling functions
        • Model functions: lm(), glm(), nls(), nls2(), lme() / nlme()
        • Symbol formulas (y ~ A + B + C ): ":" is for interaction term, "*" is for complete interaction, "^" is for crossing to a specified degree "." is a placeholder for all other variables except the dependent variable, "-" removes a variable from the equation, "-1" suppresses the intercept, "I()" has elements within the parentheses interpreted arithmetically
        • Post-estimation functions: coef(), confint(), resid(), fitted(), summary(), predict(), deviance(), print(),plot(), formula(), anova(obj1, obj2), AIC(), vcov()
        • Contrast functions: contr.helmert(), contr.poly(), contr.sum(), contr.treatment(), contr.SAS()
      • RStudio is an integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R. Shortcuts (you can modify them: Tools -> Modify Keyboard Shortcuts...)
        • Alt + Shift + K: Show a Quick Reference
        • Alt + -: Insert assignment operator "<- font="">
        • Ctrl + Shift + M: Insert pipe operator "%>%" (I changed it as Ctrl + Shift + P)
        • Ctrl + Alt + I: Insert chunk (R Notebook/Markdown)
        • Ctrl + 1: Move cursor to source Editor window
        • Ctrl + 2: Move cursor to Command window
        • Ctrl + 3: Move cursor to Help window
        • Ctrl + 4: Move cursor to History window
        • Ctrl + 5: Move cursor to File window
        • Ctrl + 6: Move cursor to Plots window
      • ...

      Monday, March 13, 2017

      choice of analytical language

      Choice of analytical language
      I have used mainly three statistical languages, Stata, R, and SAS, for many years for different purposes. The weights of usage of those three languages are shift from SAS-Stata-R to SAS-R-Stata, then, to Stata-R-SAS. Sometimes I am asked to recommend a better analytic language, which is always a hard and complicated question to me. I came across an blog written by Curtis Miller, which is very thoughtful and helpful to make this kind of choice. Here is his blog: "On Programming Languages; Why My Dad Went From Programming to Driving a Bus". Hopefully his story can help you to make your own decision.

      Wednesday, March 08, 2017

      Stata News: in the spotlight

      Stata News: in the spotlight etc.

      Friday, March 03, 2017

      Syndemics: health in context

      Syndemics: health in context
      A syndemic, coined by Merrill Singer in mid-1990s, is a conceptual framework for understanding diseases or health conditions that arise in populations and that are exacerbated by the social, economic, environmental, and political milieu in which a population is immersed. The today's issue of Lancet published a series related the syndemic... full text ...

      Tuesday, January 24, 2017

      Tuesday, January 03, 2017

      Cheng YJ, Gregg EW, Rolka DB, Thompson TJ.

      BACKGROUND:

      Monitoring national mortality among persons with a disease is important to guide and evaluate progress in disease control and prevention. However, a method to estimate nationally representative annual mortality among persons with and without diabetes in the United States does not currently exist. The aim of this study is to demonstrate use of weighted discrete Poisson regression on national survey mortality follow-up data to estimate annual mortality rates among adults with diabetes.

      METHODS:

      To estimate mortality among US adults with diabetes, we applied a weighted discrete time-to-event Poisson regression approach with post-stratification adjustment to national survey data. Adult participants aged 18 or older with and without diabetes in the National Health Interview Survey 1997-2004 were followed up through 2006 for mortality status. We estimated mortality among all US adults, and by self-reported diabetes status at baseline. The time-varying covariates used were age and calendar year. Mortality among all US adults was validated using direct estimates from the National Vital Statistics System (NVSS).

      RESULTS:

      Using our approach, annual all-cause mortality among all US adults ranged from 8.8 deaths per 1,000 person-years (95% confidence interval [CI]: 8.0, 9.6) in year 2000 to 7.9 (95% CI: 7.6, 8.3) in year 2006. By comparison, the NVSS estimates ranged from 8.6 to 7.9 (correlation = 0.94). All-cause mortality among persons with diabetes decreased from 35.7 (95% CI: 28.4, 42.9) in 2000 to 31.8 (95% CI: 28.5, 35.1) in 2006. After adjusting for age, sex, and race/ethnicity, persons with diabetes had 2.1 (95% CI: 2.01, 2.26) times the risk of death of those without diabetes.

      CONCLUSION:

      Period-specific national mortality can be estimated for people with and without a chronic condition using national surveys with mortality follow-up and a discrete time-to-event Poisson regression approach with post-stratification adjustment. (Full text)

      Wednesday, November 30, 2016

      Tuesday, November 29, 2016

      Interview with J.J. Allaire

      Interview with J.J. Allaire - the founder of RStudio
      by Joseph Rickert
      Welcome to “R Views”, the new R Community blog from RStudio. For this first post, I sat down with J.J. Allaire, RStudio’s founder and CEO, to discuss RStudio’s history, its mission and JJ’s vision for its future. In a short time, we touched on a wide range of subjects including RStudio’s business, the growth of the R language, the importance of the R Consortium to the R Community and J.J.’s advice to anyone coming to R for the first time. We hope you enjoy this “snapshot” of RStudio’s place in the R world. full text
      You can also read a Chinese version here.

      Thursday, October 13, 2016

      'Big Fat Fix' Film Challenges Mediterranean Diet

      'Big Fat Fix' Film Challenges Mediterranean Diet
      An Interview With Cardiologist Aseem Malhotra
      Editor's Note:  Cardiologist Aseem Malhotra, MBChB, MRCP, talks about his new documentary The Big Fat Fix, which sent him to Pioppi, Italy, the village where Ancel Keys researched diet and cardiovascular health. A regular contributor to the BMJ and major UK newspapers on the topic of dietary health, Dr Malhotra believes that the demonization of fat let sugar off the hook as the real culprit in the diabetes, obesity, and cardiovascular disease epidemic, and that we need to rethink our approach to exercise. ... Full Text.


      This article is an another interesting opinion based on facts and viewed from a different angle. This interview reminds me the Michael Pollan's book In Defense of Food published in 2008: Food – Not Nutrients – Is The Fundamental Unit In Nutrition. (PBS Documentary In Defense of Food in Dec. 2015, PBS Newshour and on YouTube).
      Food Insight (2015). 4 Food Rules You Won’t Find in Michael Pollan’s ‘In Defense of Food

      Wednesday, October 12, 2016

      Microbiome: Fibre for the future

      Microbiome: Fibre for the future
      Nautre: Eric Martens
      A chronic lack of dietary fibre has been found to reduce the diversity of bacteria in the guts of mice. This effect is not fully reversed when fibre is reintroduced, and increases in severity over multiple generations. ... Full text

      Battle of the data science Venn Diagrams

      Battle of the Data Science Venn Diagrams
      by David Taylor    
      Data science is a rather fuzzily defined field; some of the definitions I've heard are:
      • "Work that takes more programming skills than most statisticians have, and more statistics skills than a programmer has."
      • "Applied statistics, but in San Francisco."
      • "The field of people who decide to print 'Data Scientist' on their business cards and get a salary bump."
      Personally, I've recently decided to avoid the controversy by calling myself a data spelunker. (Data miners are out of vogue anyway.)
      As a field in search of a definition, it's unsurprising that you can find a lot of different attempts to define it.
      As a field full of data nerds with a penchant for visualization, it's also unsurprising that a lot of them use Venn diagrams. (Fun fact: John Venn, who invented the eponymous diagrams, and his son filed a patent in 1909 for an lawn bowling machine.)... Full Text

      Saturday, October 01, 2016

      Stata: Get out-of-sample file predictions

      Stata: Get out-of-sample file predictions
      Example:
           webuse auto,clear
                regress mpg weight foreign
           est store regxb
           
           preserve
             webuse newautos,clear
             est restore regxb
             predict mpg
             list
           restore

      Thursday, September 01, 2016

      Stata: display system date

      Stata: display system date
      • .di "system date:" c(current_date)
      • .di "system date:" "$S_DATE"
      • .di %td_CY-N-D  date("`c(current_date)'","DMY") // "` '" are not necessary
      • .di %td_CY-N-D  date("$S_DATE","DMY")
      • .di "system year: " year(date(c(current_date),"DMY") // w/o `' around c(current_date)
      • .di "system month: " month(date(c(current_date),"DMY"))
      • .di "system day:" day(date(c(current_date),"DMY"))
      • .di "system year: " year(date("$S_DATE","DMY"))
      • .di "system month: " month(date("$S_DATE","DMY"))
      • .di "system day:" day(date("$S_DATE","DMY"))
      •  more examples:
        • Works ('local' with '=')
          • local dd=day(date(c(current_date),"DMY"))
          • local mm=month(date(c(current_date),"DMY"))
          • local yy=year(date(c(current_date),"DMY"))
          • log using "output_`yy'_`mm'_`dd'.log", replace
          • log close
        • Doesn't work ('local' without '=')
          • local dd day(date(c(current_date),"DMY"))
          • local mm=month(date(c(current_date),"DMY"))
          • local yy=year(date(c(current_date),"DMY"))
          • log using "output_`yy'_`mm'_`dd'.log", replace
          • log close // invalid 'DMY' r(198)
        • Works ('global' with '=')
          • global dd=day(date(c(current_date),"DMY"))
          • global mm=month(date(c(current_date),"DMY"))
          • global yy=year(date(c(current_date),"DMY"))
          • log using "output $yy-$mm-$dd.log", replace
          • log close

      Sunday, July 31, 2016

      Recycle/reuse returned results in Stata

      Recycle/reuse returned results in Stata
      • UCLA: "How can I access information stored after I run a command in Stata (returned results)?" 
      • The Stata Blog: Drukker (2015). Programming an estimation command in Stata: Where to store your stuff
      • Stackoverflow (2014).Saving coefficients and standard errors as variables
      • Lembcke (2009). Advanced Stata Topics
      • SSCC. An Introduction to Mata
      • Stata commands are grouped into 4 major categories: r-class, e-class, s-class, and n-class commands. Also a c-class contains the values of system parameters and settings, along with certain constants.
      • The commands produce the statistical results are either r-class or e-class. e-class commands produce the estimation results, others are belong to r-class.
      • After submitting "contrast", Stata generates a L matrix (r(L)), you can check the contrast coefficients using "matrix list r(L)".
      • If don't know what results are outputted, use "return list" or "ereturn list" to find them. The scalar results from a r-class can be used with the "r(...)" and scalar results from e-class command can be used with "e(...)". Here, "..." is the name showed using "return list" or "ereturn list". The use of results in matrix form is a little tricky. "_b[...]" or "_se[...]" have to be used; here, "..." is the variable name of a coefficient in the model. The results for a constant is used as "_b[_cons]" for beta coefficient or "_se[_cons]" for standard error. A matric results can also converted into a matrix: "mat B=e(b)", then "disp B[rowno, colon]".
      • To show variance-covariance matrix, use: "estat vce" or just simple "matlist e(V)", and to show correlation, use: "estat vce, correlation".
      • You can "estimate store" and "estimate restore" a set of estimates with a name in memory, in such way, the following command will not be erased. If want to save and use it as a permanent file, you can use "estimate save" and "estimate use".
      • A single number can been converted into scalar, for example, "scalar xyz=_b[agecat]". However, the scalar has to be used with a pseudofunction scalar(), for example, "display scalar(xyz)" (more info)
      • The e(V) and e(b) matrices can be converted into variables of a dataset using "svmat" (convert variables into matrix using "mkmat"), which is similar to "putmat and getmat" of mata (matrix ref.):
        • mat D = e(b)', e(b)'
        • svmat double D, name(coef)
        • mat se1=vecdiag(e(V))
        • mat se2=vecdiag(e(V))
        • mat SE = se1, se2
        • svmat SE, name(se)
      • The "ereturn display" can use the e(V) and e(b) matrices to return a r-class matrix "r(table)"
      • "margins" also gives e-class results:
        • webuse dollhill3,clear
        • poisson deaths i.smokes##c.agecat, exposure(pyears)
        • est store tempreg
        • margins smokes, gen(dhat) predict(ir) // undocumented gen()
        • mean dhat1 // for smokes = 0
        • scalar dhat1=_b[dhat1] // .00810452
        • margins smokes, eydx(agecat) predict(ir) post
        • scalar eydxsmokes0=_b[0.smokes] // 1.046826
        • est restore tempreg
        • margins smokes, dydx(agecat) predict(ir) post
        • scalar dydxsmokes0=_b[0.smokes] // .00848402
        • disp scalar(dydxsmokes0)/scalar(dhat1) // gives 1.046826
      • Gould(2010).Mata Matters; (2011).Mata, the missing manual. Baum(2009).Using Mata to work more effectively in Stata
      • putmat and getmat - Put Stata variables into Mata and vice versa
        • mata r2=(1\2\3)
        • mata b=st_matrix("e(b)")'
        • mata se=sqrt(diagonal(st_matrix("e(V)")))
        • getmata r2 b se, force
        • vwls b r2, sd(se)
        • reg b r2
      • Rename "rowname" and "colname" of a matrix
           program estmatrename, eclass
             matrix BB = e(b)
             matrix colnames BB = "1.race" "2.race" "3.race"
             ereturn repost b = BB, rename
             matrix VV = e(V)
             matrix colnames VV = "1.race" "2.race" "3.race"
             matrix rownames VV = "1.race" "2.race" "3.race"
             ereturn repost V = VV
           end

        • total heartatk [pw=swgt], over(race)
        • estmatrename
        • lincom (_b[3.race]-_b[1.race])/2
        • test _b[1.race]=_b[2.race]
        • contrast {race 1 -1 0}
        • contrast p(1).race
      • Convert ln(RR) into RR and percent change
        • webuse dollhill3
        • poisson deaths smokes i.agecat,exposure(pyears) irr margins agecat, predict(ir) post
        • qui nlcom (lnRR21:ln((_b[2.agecat]/_b[1.agecat])))(lnRR31:ln((_b[3.agecat]/_b[1.agecat]))) (lnRR41:ln((_b[4.agecat]/_b[1.agecat]))), post
        • ereturn disp,eform(RR) cformat(%5.2f) pformat(%5.4f)
        • mat rtable=r(table)'
        • mat RR=rtable[1...,"b"],rtable[1...,"ll".."ul"]
        • mata st_matrix("pctable",(st_matrix("RR"):-1):*100)
        • mat coln pctable=RR LL UL
        • matlist pctable, format(%10.2f)r

      Monday, May 23, 2016

      The 21 greatest graduation speeches of the last 60 years

      Vox: The 21 greatest graduation speeches of the last 60 years
      by German Lopez on May 11, 2016
      "Graduation speeches are the last opportunity for a high school or college to educate its students. It's unsurprising, then, that these institutions often pull in some of the world's most powerful people to leave an equally powerful impression on their students. Here are the best of those speeches and some of the sections that resonate the most..." (May 11, 2016)
      To read and watch the full article on the Vox website here.

      Sunday, May 01, 2016

      R! Books

      R! Books