Blog: R Documentation and Learning Resources
IDE and GUI
- RStudio is an IDE for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R. RStudio Webinars
- Rcmdr is a GUI of R.
- esquisse is a great addin for ggplot2
- Deducer is a good but relative old GUI for exploring data like JMP with ggplot2 behind (Plot Builder). JGR is a Java GUI for R. (ggplot2 – much easier with JGR and Deducer). To use Deducer, you need - install.packages(c("JGR", "Deducer", "DeducerExtras")) -, submit: - library(JGR) -, - JGR() -; then, in the JGR console, to load Deducer, go to 'Packages & Data' > 'Package Manager' and select Deducer and DeducerExtras. more: all-in-one installing JGR and Deducer. Notes: 1) to set JAVA location using --- options(java.home="xxx/Java/") ---. 2) rJAVA ver 9.6 needs running under the 32 bit R! on my computers.
- GrapheR (pdf) is another GUI for draw customized graphs without knowing any R commands.
- Tessera - Open source environment for deep analysis of large complex data (Divide and recombine)
- The application Bio7 is an integrated development environment for ecological modelling and contains powerful tools for model creation, scientific image analysis (ImageJ) and statistical analysis.
- haven allows you to load foreign data formats (SAS, SPSS and Stata).
- SAS calls functions in R language (pdf): RUN ExportDatasetToR(“xxx.mydata”); PROC IML; SUBMIT /R; ... R codes ... ENDSUBMIT; QUIT; RUN ImportDataSetFromR("WORK.xxx", "xxx");
- Bewerunge (2011). Call R function from SAS (pdf)
- Plot symbols
- plot symbols (pch), size
- Color
- Pick a color site built in R with Shiny tags %$%
- R Color Cheat Sheet, Color, palette (ggplot2: ggplot2 color, Cookbook for R), chart of R! colors.
- RColorBrewer provides palettes for drawing nice maps shaded according to a variable.
- Wesanderson provides palettes derived from the Tumblr blog Wes Anderson Palettes.
- Simple Statistics: Colors in R
- Glynn (2007). Using color in R
- CRAN Task View organizes the packages into different groups such as Graphics, Survival Analysis, etc.
- margins: An introduction to 'margins'
- Transition from Excel to R!:
- DT: An R interface to the DataTables library
- excelR: An R interface to jExcel library to create web-based interactive tables and spreadsheets compatible with Excel or any other spreadsheet software
- DTedit: Editable DataTables for shiny apps
- rhandsontable is a htmlwidget based on the handsontable.js library.
- formattable is designed for applying formatting on vectors and data frames
- rpivotTable is a R wrapper for the great library pivottable
- Zelig is a general purpose statistics program fro estimating, interpreting, and presenting results from any statistical method. It turns the power of R with free ranging syntax, diverse examples, and documentation written for different audiences — into the same three commands and consistent documentation for every method
- Report and documentation
- knitr package was designed to generate dynamic report with R. Chunk options.
- rmarkdown is an authoring format that enables easy creation of dynamic documents, presentations, and reports from R.
- bookdown is to facilitate writing books and long-form articles/reports with R Markdown. Here are several very good free books published on the bookdown website
- Programming
- Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation.
- Data I/O and manipulation
- data.table package provides an enhanced version of data.frame. Here is the manual and help. It's a star package on r-pkg.org.
- rio is a Swiss-army knife for data I/O.
- readr makes it easy to read many types of tabular data. -read_fwf()- is a powerful function for reading fixed width file, for example, read in the NCHS mortality file.
- Janitor has simple functions for examining and cleaning dirty data
- sqldf is an R package for running SQL statements on R data frames. sqldf supports SQLite database as default (SQLite syntax).
- haven allows you to load foreign data formats (SAS, SPSS and Stata).
- xlsx gives programatic control of Excel files using R.
- read.table() reads a file in table format and creates a data frame from it.
- arrow supports for analyzing large, multi-file datasets, working with individual Parquet and Feather files.
- MMWRweek: Convert dates to MMWR Day, Week, and Year
- dlookr: Diagnose, explore and transform data
- foreign read data stored by other system.
- dplyr is a new package which provides a set of tools for efficiently manipulating datasets in R. Introduction to dplyr. Notes: "Both data.table and dplyr were able to reduce the problem to less than a few seconds. If you’re looking for pure speed data.table is the clear winner. However, it is my understanding that data.table‘s syntax can be frustrating, so if you’re already used to the ‘Hadley ecosystem’ of packages, dplyr is a formitable alternative, even if it is still in the early stages."
- tidyr with spread() and gather(), a reframing of reshape2, is a new package that makes it easy to “tidy” your data.
- magrittr is a forward pipe operator. magrittr: Simplifying R code with pipes. History of Magrittr.
- Aggregation and restructure: reshape2 package with melt() or cast() function.Transpose using t() function, aggregating data using aggregate() function.
- lubridate is good for manipulating time and date.
- Stringr makes it easier to work with strings (article).
- Quick-R examples of outputing R data.
- RStudio addins: datapasta, ggthemeassist
- pdftools: Text Extraction, Rendering and Converting of PDF Documents (example)
- Survey related packages
- survey (Thomas Lumley) provides facilities in R for analyzing data from complex surveys. Complex sampling and R (pdf). Analyze US Government Survey Data with R. You can use "survey" to create replicate weights, and use withReplicates computes variances by replicate weighting.
- rpms is for Recursive Partitioning for Modeling Survey data
- lavaan.survey is for complex survey structural equation modeling (SEM). An R Package for Complex SurveyAnalysis of Structural Equation Models (pdf)
- sampling includes many different algorithms for drawing survey samples and calibrating the design weights.
- convey is a package for estimating indicators of income concentration and poverty (including gini coefficient, atkinson index, at-risk-of-poverty threshold etc.) wrapped around the Lumley's survey package by Anthony Damico. He has worked "analyze survey data for free with R language" for years (asdfree).
- GIS
- Pebesma: Spatial Data Science with application in R
- Methods Bites: Using Geospatial Data in R
- Pebesma: Simple Features for R
- R-spatial.org: Drawing beautiful maps programmatically with R, sf and ggplot2, Part 1, Part 2, Part 3
- geofacet: Introduction to geofacet
- ggmap is the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2. German Gas Prices illustrated, Cheatsheet
- Robin. Introductory tutorial on graphical display of geographical information in R
- Other packages: maps (draw geographical maps), maptools (reading and hadling spatial objects), mapdata (extra map databses), scales (scale functions for visualization), mapproj (map projections), RgoogleMaps (overlays on static maps)
- tidycensus is an R package that allows users to interface with the US Census Bureau’s decennial Census and five-year American Community APIs
- Data Visualization and Graphics
- The R graph Gallery
- Auguie: Laying out multiple plots on a page
- patchwork: to arrange your graphs easily
- ggplot2 (document) (You can build the book here: ggplot2: Elegant graphics for data analysis) is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. Cookbook for R is a great book. Tutorial websites: Tutorial I, Tutorial II. A good book website of Guide to Create Beautiful Graphics in R (Cheatsheet: Be Awesome in ggplot2). Beautiful plotting in R: A ggplot2 cheatsheet (pdf). Tutorial video (Part I and Part II) by Roger Peng
- ggplot2 extensions gallery
- ggThemeAssist is a RStudio-Addin and provides a GUI for editing ggplot2 themes
- esquisse is a addin, which allows you to interactively explore your data by visualizing it with the ggplot2 package.
- cowplot and ggstatsplot are add-on packages of ggplot2
- ggpubr provides some functions for creating figures based on publication requirement
- You may use ggplot_gtable() to plot data generated by ggplot_build() to find the plot objects (grob) (grobs <- font="" ggplot_build="" ggplot_gtable="" yourgraph="">->), then use the grid.draw() of grid package to draw them individually. Hadley Wickham has blogs about how to Mixing ggplot2 graphs with other graphical output, Editing raw Grid objects from a ggplot.
- ggedit is used to facilitate ggplot formatting
- A conversation with Hadley Wickham 2014 - Eduardo
- ggthemes is a package including some extra geoms, scales, and themes styles (the Economist, Excel, Stata, etc.) for ggplot2.
- GGally is designed to be a helper to ggplot2. It contains templates for different plots to be combined into a plot matrix, a parallel coordinate plot function, as well as a function for making a network plot.
- gtable is a package of tools to make it easier to work with ``tables'' of grobs, which is internally used by ggplot2. I used the gtable::cbind(..., size = "max", z = NULL) or gtable::rbind(..., size = "max", z = NULL) to match the size of plots.
- ggprepel is a ggplot2 extension to avoid overlapping text labels.
- Walker (2014). International population pyramids with ggplot2
- ggforce accelerates 'ggplot2'
- ggraph is an extension of ggplot2 tailored at plotting graph-like data structures (graphs, networks, trees, hierarchies...)
- WVPlots Provides examples of ggplot plots that can be generated from a standard calling interface
- classifierplots generates a visualization of binary classifier performance as a grid of diagonstic plots with just one function call
- ggalt: amke a dumbbell plot in ggplot2
- ggcharts aims to get you to your desired plot faster.
- Plotly: ggplotly of plotly package is a powerful tool to convert ggplot2 plots and create interactive, online ggplot2 charts with D3.js. Carson Sievert used this converter recreated Hadley Wickam’s entire ggplot2 documentation (here). Click-drag to zoom, shift-click to pan, double-click to autoscale. It's really amazing. Plotly Tutorial: Plotly and R
- grid and gridExtra are useful combined with ggplot2, for example:
- library(grid) # for unit() in ggplot2
- grid.arrange(p1, p2, p3, p4,ncol=2)
- ggvis new package for data visualization. Like ggplot2, it is built on concepts from the grammar of graphics, but it also adds interactivity, a new data pipeline, and it renders in a web browser.
- vcd for visualizing categorical data. Working with categorical data with R and the vcd and vcdExtra packages and Visualizing Categorical Data with SAS and R by Michael Friendly.
- Distribution visualization: vioplot for violin plot (examples of boxplot and violin plot). Hintze (1998). Violin Plots: A Box Plot-Density Trace Synergism. Violin plot using ggplot2. Boxplots and beyond: Part I, II: asymmetry, III: violin plots, IV: beanplots.
- venneuler is my favorite package for make a area proportional Venn and Euler Diagram simple and fast, I use it create diagrams then use Inkscape to modify them. A more complicated package is VennDiagram. Wilkinson (2012) introduced his venneuler package in the artilce "Exact and Approximate Area-proportional Circular Venn and EulerDiagrams". Chen (2011) compared some R! package and programs in the article "VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R" (Create a Venn diagram using MS Excel)
- venn = venneuler(c(a1c=.15, fpg=.11, ogtt=.13, 'a1c&fpg'=.1, 'fpg&ogtt'=.13, 'a1c&ogtt'=.4, 'a1c&ogtt&unknow'=.1))
- plot(venn)
- Vennerable package provides routines to compute and plot Venn diagrams, including the classic two- and three-circle diagrams but also a variety of others with different properties such as rectangular or square Venn diagram and for up to seven sets. It's not available at CRAN. You can install this package using: install.packages("Vennerable", repos = "http://R-Forge.R-project.org")
- googleVis - Google Motion Charts with R!
- colourlovers provides access to the COLOURlovers API, which offers color inspiration and color palettes
- Lattice is a powerful and elegant high-level data visualization system, with an emphasis on multivariate data.
- scatterplot3d provides routines for the visualization of multivariate data in a three dimensional space.
- iPlots is a package for the R! which provides high interaction statistical graphics, written in Java.
- rcharts is an R package to create, customize and publish interactive javascript visualizations from R using a familiar lattice style plotting interface.
- wordcloud2 and wordcloud create wordcloud for data visualization. Using R to create a Word Cloud from a PDF Document. 可能是目前最好的词云解决方案wordcloud2
- David Smith (2017).Packages to simplify mapping in R
- magick is a package of wrapping up of ImageMagick which is an image processing library
- networkD3: D3 JavaScript Network Graphs from R, good at Sankey Diagram
- Missing Imputation
- Amelia: A Program for Missing Data (missmap())
- Berk (2021): Handling missing data using XGBoost, XGBoost
- Josse: Handling missing values with R
- MissingDataGUI: a GUI for missing data exploration
- MICE: multiple imputation using fully conditional specification.
- Statistics/Machine Learning
- MLR – Machine Learning in R and MLR3 a scussor of MLR. Here is the mlr3 book
- tidymodes: a combination of multiple packages: parsnip, recipes, etc.
- tidyverse is a collection of popular packages for data munging from Hadley Wickham including ggplot2, dplyr, tidyr, readr, purrr, tibble, etc.
- bayestestR provides a comprehensive and consistent set of functions to analyze and describe posterior distributions generated by a variety of models objects, including popular modeling packages such as rstanarm, brms or BayesFactor.
- rms: is a package goes along with the book Regression Modeling Strategies
- caret is a set of functions that attempt to streamline the process for creating predictive models. Here are caret webinar and slides
- splines is now a part of R! standard/add-on packages/bundle. splines::ns() generate a matrix for natural cubic splines. Bendix tuned the ns() as Ns() in his Epi package, which used the smallest and the largest of the supplied knots as boundary knots. You can download the latest version on his website here.
- lavaan: latent variable analysis
- msm::deltamethod: delta method, FAQ UCLA
- nls can be used to determine the nonlinear (weighted) least-squares estimates of the parameters of a nonlinear model, which is similar to the 'nl' of stata and 'proc nlin' of SAS
- rpart is for the recursive partitioning for classification, regression and survival trees.
- Therneau (2015).An introduction to recursive partitioning using the RPART routines
- Classification Trees using the rpart function (2010)
- Let's have a "party" and tear this place "rpart"! (2013)
- Hmisc: Harrell Miscellaneous
- convey estimates measures of poverty, inequality, and wellbeing using the complex survey data
- deltavar() is a function in the emdbook packages of the book, which calculates delta-method-based variances for functions with any number of parameters
- deltavar(log(A/B),meanval=c(A=0.8,B=12),Sigma=matrix(c(0.1,0,0,8.0),nrow=2)) ---> 0.2118056 = ((1/A)^2*Var(A)+(1/B)^2*Var(B))^0.5
- deltavar(log(A/B),meanval=c(A=0.8,B=12),Sigma=c(0.1,8.0)) -> 0.2118056
- MASS is a classic package for Venables and Ripley's Modern Applied Statistics with S
- Bolstad is A set of R functions and data sets for the book Introduction to Bayesian Statistics, Bolstad, W.M. (2016). The "bayes.lin.reg" function may be used for combining the two estimates like meta-analysis.
- Psych: Using R for psychological research
- Electronic Health/Medical Record (EHR) related