Showing posts with label Statistics. Show all posts
Showing posts with label Statistics. Show all posts

Wednesday, May 15, 2019

effect size, P value, and Bayes odds

value, effect size, and Bayes factor, and after one year

Wednesday, August 09, 2017

how to estimate the risk and relative risk from logistic regression

How to estimate the risk and relative risk from logistic regression of a case-control study (OR to RR)
One advantage of odds ratios is that we can estimate it in a case-control study, when we usually oversample the cases. However, we cannot directly calculate the probability using logistic regression in a case-control study, since the beta(0) of the case-control study cannot represent the target population, which beta(0) of the target population is equal to beta(0) of the case-control study - log(sampling probability for cases/sampling probability for controls).

Friday, March 24, 2017

R functions and keyboard shortcuts

functions/commands and keyboard shortcuts
R! is powerful and has rich packages and functions. It's impossible to build a list of functions/shortcuts to fit the purposes of all. Below are some functions/shortcuts related to my projects.
  • CheatsheetsThe R Guide, R Reference Card
  • Help functions: help()/?, apropos(), find(): apropos() finds all objects. find() the locations of found objects, methods(), example(), demo(), vignette(), args() 
  • Housekeeping functions: getwd(), setwd(), rm(list=ls()) removes all objects in the R environment, source("myRscript.r") runs the R codes in "myRscript.r" file, fix() modifies the original object, and edit() is used edit an object and returns to a new object, download.file() downloads a file from the Internet, attach()/detach() objects, search() shows the current search paths and sequence, install.packages(), update.packages(), remove.packages(), getOption("defaultPackages") which can be changed by setting the option in startup code (e.g. in ~/.Rprofile), .libPaths()
  • Numeric/character functions: length(), seq(), rep(), cut(), pretty(), cat(), substr(), grep(), sub(), strsplit(), paste(), toupper(), tolower()
  • Data functions: read.table(), head(), tail(), str(), class(), length(), dim(), nrow(), ncol(), names(), levels(), length(), c(), cbind(), rbind(), append(), rep(), rev(), sort(), unique()
  • Type functions: "is." for checking or "as." for conversion + numeric(), character(), vector(), matrix(), data.frame(), factor(), logical(), integer(). For example: is.numeric(), as.numeric()
  • Mathematical functions: abs(), sqrt(), log(), log(x, base=n), log10(), exp(), prod(), factorial(), choose(), ceiling(), floor(), solve(), trunc(), round(), signif(), cos(), sin(), tan(), acos()
  • Statistical functions: mean(), median(), sd(), var(), mad(), quantile(), range(), sum(), diff(), min(), max(), scale(), fivenum(), cumsum(), cumprod(), cummax(), cumin(), cor(), colSums(), rowSums(), colMeans(), rowMeans()
  • Probability functions: the form is [d][p][q][r]distribution(). d, p, q, r are for (d)ensity, cumulated (p)robability/distribution function, (q)uantile function, and (r)andom generation, respectively. the Distribution types can be: (norm)al, (beta), (binom)ial, (chisq)uared, (exp)onential, (logis)tic, (multinom)ial, (n)egative (binom)ial, (pois)son, (f), (gamma), (t), (unif)orm, etc. for example: dnorm(), pnorm(), qnorm(), rnorm()
  • Statistical modeling functions
    • Model functions: lm(), glm(), nls(), nls2(), lme() / nlme()
    • Symbol formulas (y ~ A + B + C ): ":" is for interaction term, "*" is for complete interaction, "^" is for crossing to a specified degree "." is a placeholder for all other variables except the dependent variable, "-" removes a variable from the equation, "-1" suppresses the intercept, "I()" has elements within the parentheses interpreted arithmetically
    • Post-estimation functions: coef(), confint(), resid(), fitted(), summary(), predict(), deviance(), print(),plot(), formula(), anova(obj1, obj2), AIC(), vcov()
    • Contrast functions: contr.helmert(), contr.poly(), contr.sum(), contr.treatment(), contr.SAS()
  • RStudio is an integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R. Shortcuts (you can modify them: Tools -> Modify Keyboard Shortcuts...)
    • Alt + Shift + K: Show a Quick Reference
    • Alt + -: Insert assignment operator "<- font="">
    • Ctrl + Shift + M: Insert pipe operator "%>%" (I changed it as Ctrl + Shift + P)
    • Ctrl + Alt + I: Insert chunk (R Notebook/Markdown)
    • Ctrl + 1: Move cursor to source Editor window
    • Ctrl + 2: Move cursor to Command window
    • Ctrl + 3: Move cursor to Help window
    • Ctrl + 4: Move cursor to History window
    • Ctrl + 5: Move cursor to File window
    • Ctrl + 6: Move cursor to Plots window
  • ...

Monday, March 13, 2017

choice of analytical language

Choice of analytical language
I have used mainly three statistical languages, Stata, R, and SAS, for many years for different purposes. The weights of usage of those three languages are shift from SAS-Stata-R to SAS-R-Stata, then, to Stata-R-SAS. Sometimes I am asked to recommend a better analytic language, which is always a hard and complicated question to me. I came across an blog written by Curtis Miller, which is very thoughtful and helpful to make this kind of choice. Here is his blog: "On Programming Languages; Why My Dad Went From Programming to Driving a Bus". Hopefully his story can help you to make your own decision.

Tuesday, January 03, 2017

Cheng YJ, Gregg EW, Rolka DB, Thompson TJ.

BACKGROUND:

Monitoring national mortality among persons with a disease is important to guide and evaluate progress in disease control and prevention. However, a method to estimate nationally representative annual mortality among persons with and without diabetes in the United States does not currently exist. The aim of this study is to demonstrate use of weighted discrete Poisson regression on national survey mortality follow-up data to estimate annual mortality rates among adults with diabetes.

METHODS:

To estimate mortality among US adults with diabetes, we applied a weighted discrete time-to-event Poisson regression approach with post-stratification adjustment to national survey data. Adult participants aged 18 or older with and without diabetes in the National Health Interview Survey 1997-2004 were followed up through 2006 for mortality status. We estimated mortality among all US adults, and by self-reported diabetes status at baseline. The time-varying covariates used were age and calendar year. Mortality among all US adults was validated using direct estimates from the National Vital Statistics System (NVSS).

RESULTS:

Using our approach, annual all-cause mortality among all US adults ranged from 8.8 deaths per 1,000 person-years (95% confidence interval [CI]: 8.0, 9.6) in year 2000 to 7.9 (95% CI: 7.6, 8.3) in year 2006. By comparison, the NVSS estimates ranged from 8.6 to 7.9 (correlation = 0.94). All-cause mortality among persons with diabetes decreased from 35.7 (95% CI: 28.4, 42.9) in 2000 to 31.8 (95% CI: 28.5, 35.1) in 2006. After adjusting for age, sex, and race/ethnicity, persons with diabetes had 2.1 (95% CI: 2.01, 2.26) times the risk of death of those without diabetes.

CONCLUSION:

Period-specific national mortality can be estimated for people with and without a chronic condition using national surveys with mortality follow-up and a discrete time-to-event Poisson regression approach with post-stratification adjustment. (Full text)

Wednesday, February 10, 2016

accept-reject algorithm

Accept-reject algorithm
Accept-reject algorithm (acceptance-rejection method) or reject sampling is a simple and general simulation method to decide observations with or without a trait from the probability of a distribution. In this way, we can convert a probability into a dichotomous condition (i.e. yes or no). Basically, there are three steps:
  • Step 1. Generate Y from density g [Y = f(x), the pdf of f(x) is the target distribution]
    • Sample a point (an x-position) from the proposal density distribution (g) and draw a vertical line at this point, get the density (an y-position) [X ~ g(x)]. The density function of Y has a upper, a constant c, and c is >=1.
  • Step 2. Generate U from the uniform distribution on the interval (0, cg(x)) [U = cg(x), the pdf of cg(x) is the proposal distribution]
    • Sample uniformly along in the range of x-position (i.e. uniformly from 0 to the maximum of the probability density function) [U ~ runif(0, 1)]
  • Step 3. If U <= Y, then set Y = X ("accept"), else repeat Steps 1 and 2
Pr(X|accept) = Pr(accept|X) x Pr(X)/Pr(accept), using Bayes' theorem
Pr(accept|X) = f(x)/cg(x)
Pr(X) = g(x)
Pr(accept) = 1/c
therefore, Pr(X|accept) = f(x)


Example: Stata simulation and define the event


clear
set seed 770488
set obs 1000

gen x = runiform() - .5
gen z = runiform() - .5
gen xb = x + 8*z

 gen y = 1 / (1 + exp(xb)) < runiform() // y defined as 0 or 1
logistic y x z






Tuesday, December 15, 2015

general linear models vs. generalized linear models

General linear models vs. generalized linear models


 



Typical estimation method



Special cases



Function in R



Function in Matlab

mvregress()

glmfit()

Procedure in SAS



Command in Stata



Function in Mathematica

LinearModelFit

GeneralizedLinearModelFit

Command in EViews

ls
  • Generalized linear models have the flexiblility for response variables that have other than a normal distribution. If a generalized linear model uses an identity link function and a normal family distribution, then this model is equivalent to a general linear model.
  • Generalized linear mixed models have the flexibility to model random effects and correlated errors for nonmormal data.