Tuesday, December 15, 2015

general linear models vs. generalized linear models

General linear models vs. generalized linear models


 



Typical estimation method



Special cases



Function in R



Function in Matlab

mvregress()

glmfit()

Procedure in SAS



Command in Stata



Function in Mathematica

LinearModelFit

GeneralizedLinearModelFit

Command in EViews

ls
  • Generalized linear models have the flexiblility for response variables that have other than a normal distribution. If a generalized linear model uses an identity link function and a normal family distribution, then this model is equivalent to a general linear model.
  • Generalized linear mixed models have the flexibility to model random effects and correlated errors for nonmormal data.

non-probability sample

Non-Probability Sample


Definition
Reflection
Video

Friday, November 20, 2015

All-cause mortality was increasing among US middle age whites

All-cause mortality was increasing among US middle age Whites


Title: Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century
Authors: Anne Case and Angus Deaton
Abstract: This paper documents a marked increase in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change reversed decades of progress in mortality and was unique to the United States; no other rich country saw a similar turnaround. The midlife mortality reversal was confined to white non-Hispanics; black non-Hispanics and Hispanics at midlife, and those aged 65 and above in every racial and ethnic group, continued to see mortality rates fall. This increase for whites was largely accounted for by increasing death rates from drug and alcohol poisonings, suicide, and chronic liver diseases and cirrhosis. Although all education groups saw increases in mortality from suicide and poisonings, and an overall increase in external cause mortality, thosewith less education saw the most marked increases. Rising midlife mortality rates of white non-Hispanics were paralleled by increases in midlife morbidity. Self-reported declines in health, mental health, and ability to conduct activities of daily living, and increases in chronic pain and inability to work, as well as clinically measured deteriorations in liver function, all point to growing distress in this population. We comment on potential economic causes and consequences of this deterioration. Full text: PNAS

Related articles:




Saturday, October 10, 2015

How to recover a lost partition of a hard drive

How to recover a lost partition of a hard drive


There are two major reasons you might not see the drive letter of your computer: the logic drive letter lose  or partition table corrupted.

Try these steps first:
  • Go to the 'cmd' window by holding the "Windows" key and press the "R" key
  • Type and run 'diskmgmt.msc'
  • "Disk Management" will be shown.
  • If see a partition without a drive letter then right-click on it
  • Select "Change Drive Letter and Paths..." 
  • Click on "Add" button
  • Select the drive letter and Click on OK.
If you cannot see the partition without a drive letter, the partition table may be corrupt, try the TestDisk after the original instruction here or abstracted steps below:
  • Download the TestDisk
  • Unzip and save it on the USB drive
  • Run "testdisk_win"
  • At the first window, select “No Log” and press the key
  • Select which drive to analyse, choose “Proceed” and press key
  • Select partition type (select "Intel" if it’s a PC) then press key
  • Select “Analyse” then press key
  • Select “Quick Search” at the next screen, then press  key
  • Press key, if the partitions were created under Vista – press key if not.
  • TestDisk should say “Structure OK”. If so, press key 
  • Select “Write” and press key and press   key to confirm.
  • "ok" to reboot the compute, press key
  • Now, close TestDisk and RESTART the computer.

Sunday, June 28, 2015

R documentation and Learning Resources

R! documentation and Learning Resources

Monday, June 01, 2015

Running and longevity

Running and Longevity

News
  • DialyMail: Stop that binge jogging
    • Long term study found slow joggers had the lowest rates of death
    • Strenuous joggers were as likely to die as sedentary non-joggers
    • Going jogging three times a week for no more than 2.4 hours is optimal
    • Pace of the slow joggers corresponds to vigorous exercise and strenuous jogging corresponds to very vigorous exercise, researchers qualified
  • ScienceDaily: Light jogging may be most optimal for longevity
Guidelines and Researches

Sunday, February 08, 2015

The Roseto Mystery

The Roseto Mystery

A few days ago, I read a book, Outliers (2008), by Malcolm Gladwell. The book is quite insightful. All children, parents, grandparents (ha-ha), and educators can be benefited from reading it. The introduction chapter is about the Rosetans who living in the Pennsylvannia, which is more related to my work. The Rosetans are outliers who are eating a big portion of saturated fat including lard with some other bad habits such as smoking. However, “THESE PEOPLE WERE DYING OF OLD AGE. THAT’S IT.” The Gladwell showed some evidences of importance of a low stressful lifestyle, good supports from family, friends, and communities for the health and longevity.

Here are a few more stories related to the Rosetans who living in Roseto, Pennsylvania:

Monday, January 12, 2015

Cancer isn’t just bad luck

Cancer isn’t just bad luck
By Thomas Lumley

"
From Stuff, "Bad luck is responsible for two-thirds of adult cancer while the remaining cases are due to environmental risk factors and inherited genes, researchers from the Johns Hopkins Kimmel Cancer Center found."

...

So, in summary: the “two-thirds of cancers explained” is Just Wrong. Doing a mathematically correct calculation gives about one third. Doing a calculation that’s actually relevant to cancer in the population gives even smaller values. (update) That’s not to say that DNA replication errors are unimportant — the paper makes it clear that they are important.
"

The fulltext: Cancer isn't just bad luck

Friday, November 28, 2014

Chris Botti in Boston, 2008.


Beautiful Sound of Trumpet by Chris Botti in Boston, 2008

Christopher Stephen "Chris" Botti (born October 12, 1962), is an Italian-American trumpeter and composer. Wikipedia - Chris Botti

Pavel Levin, "Chris Botti in Boston features trumpeter Chris Botti along with a bevy of name artists performing live with the Boston Pops Orchestra at Symphony Hall in 2008. Fully documented as a concert film and album, the night is an intimate and soulful birds-eye view of the supple-toned trumpeter who has grown into his role as a virtuoso since his time backing up Sting ..." 



Tuesday, August 19, 2014

Data Cleaning is a critical part of the Data Science process

Data Cleaning is a critical part of the Data Science process
by David Smith

A New York Times article yesterday discovers the 80-20 rule: that 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. The article gives short shrift to this important task by calling it "janitorial work", but whether you call it data munging, data wrangling or anything else, it's a critical part of the data science. I'm in agreement with Jeffrey Heer, professor of computer science at the University of Washington and a co-founder of Trifacta, who is quoted in the article saying,

     “It’s an absolute myth that you can send an algorithm over raw data and have insights pop up.”

As an illustration of this point, check out the essay by Julia Evans, Machine learning isn't Kaggle competitions (hat tip: Drew Conway). A Kaggle competion typically presents a nice, clean, regularized data set to the competitors, but this isn't representative of the real-world process of making predictions from data. As Julia points out:

     Cleaning up data to the point where you can work with it is a huge amount of work. If you’re trying to reconcile a lot of sources of data that you don’t control like in this flight search example, it can take 80% of your time.

While there are projects underway to help automate the data cleaning process and reduce the time it takes, the task of automation is made difficult by the fact that the process is as much art as science, and no two data preparation tasks are the same. That's why flexible, high-level langauages like R are a key part of the process. As Mitchell Sanders notes in a Tech Republic article,

     Data science requires a difficult blend of domain knowledge, math and statistics expertise, and code hacking skills. In particular, he suggests that expert knowledge of tools like R and SAS are critical. "If you can't use the tools, you can't analyze the data."

This is a critical step to gaining any kind of insight from data, which is why data scientists still command premium salaries today, according to data from Indeed.com.

Tuesday, August 05, 2014

How to generate a color-coded/formatted Word file of R programs in RStudio

RMarkdown: Generate a color-coded/formatted Word file of R! using RStudio

Here is my approaches/cheat-sheet/template:

If have codes like these:
 
     library("xxx")    
     summary (usmort)
 
Method I (using RStudio)
  • Start a new R markdown file (File -> New File-> R Markdown).
  • Delete all the codes that RStudio generated in that new file.
  • Copy and paste the R codes above into the new file
  • Add global option (between --- and ---), chunk head (```) with option (eval=FALSE), and chunk tail (```) :
    ---
    output:
    word_document:
    highlight: pygments
    ---
    ```{r,eval=FALSE}
                 ... your codes here ...
    ``` 
  • Click the ‘knit word’ button, you get a Word file with codes show like below: 
     library("xxx")
     summary
(usmort)
 
Method II (with or without using RStudio)
  • add chunk head (```{r,eval=FALSE}) and tail (``` ) and wrap the whole codes, and save the file ("xxx_r_code.R").
  • run the R commands in console window:
       >library(rmarkdown)
    >render("xxx_r_codes.R",word_document(highlight="pygments"))

Thursday, June 19, 2014

Poison Ivy

Poison Ivy
 
Diagnose Poison Ivy Rash
Treat Poison Ivy Rash
Identify Poison Ivy
 
Remove Poison Ivy
 
 

Wednesday, April 23, 2014

Simpson's Paradox

Simpson's Paradox

Simpson's paradox (or the Yule-Simpson effect) is an unexpected result in which a association/correlation of  x and y in different groups disappears or reverses when the groups are combined.

Tuesday, April 15, 2014

uncertainty

When scientist and policy maker plus uncertainty

  • Schmitt (2014). "The Pitfall of Uncertainty" (The Scientist)
    • The Scientist published an opinion article about what uncertainty means to scientisit and policy maker. "...they must be crystal clear: science is about discovery and (decreasing) uncertainty, policymaking is about achieving consensus (if not certainty). Together, scientists and policymakers alike must strive to make responsible decisions for the benefit of society."
  • NPR (2017). Alan Alda's Experiment: Helping Scientists Learn To Talk To The Rest Of Us
  • Wikipedia:
    • A intuitive figure of true vs truth of Starecat.com: This is TRUE, this is TRUE, this is TRUTH. I like this figure indeed. However, 'True' is a an adjective; 'truth' is a noun. I would like to change this into "This is TURE, this is TURE, this is THE THUTH" or "This is a FACT, this is a FACT, this is THE TRUTH"
    


Thursday, April 03, 2014

Proposed Revisions to the Common Rule

Proposed Revisions to the Common Rule for the Protection of Human Subjects in the Behavioral and Social Sciences

It examines how to update human subjects protections regulations so that they effectively respond to current research contexts and methods. With a specific focus on social and behavioral sciences, this consensus report aims to address the dramatic alterations in the research landscapes that institutional review boards (IRBs) have come to inhabit during the past 40 years. The report aims to balance respect for the individual persons whose consent to participate makes research possible and respect for the social benefits that productive research communities make possible... (full text)