Friday, November 28, 2014

Chris Botti in Boston, 2008.


Beautiful Sound of Trumpet by Chris Botti in Boston, 2008

Christopher Stephen "Chris" Botti (born October 12, 1962), is an Italian-American trumpeter and composer. Wikipedia - Chris Botti

Pavel Levin, "Chris Botti in Boston features trumpeter Chris Botti along with a bevy of name artists performing live with the Boston Pops Orchestra at Symphony Hall in 2008. Fully documented as a concert film and album, the night is an intimate and soulful birds-eye view of the supple-toned trumpeter who has grown into his role as a virtuoso since his time backing up Sting ..." 



Tuesday, August 19, 2014

Data Cleaning is a critical part of the Data Science process

Data Cleaning is a critical part of the Data Science process
by David Smith

A New York Times article yesterday discovers the 80-20 rule: that 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. The article gives short shrift to this important task by calling it "janitorial work", but whether you call it data munging, data wrangling or anything else, it's a critical part of the data science. I'm in agreement with Jeffrey Heer, professor of computer science at the University of Washington and a co-founder of Trifacta, who is quoted in the article saying,

     “It’s an absolute myth that you can send an algorithm over raw data and have insights pop up.”

As an illustration of this point, check out the essay by Julia Evans, Machine learning isn't Kaggle competitions (hat tip: Drew Conway). A Kaggle competion typically presents a nice, clean, regularized data set to the competitors, but this isn't representative of the real-world process of making predictions from data. As Julia points out:

     Cleaning up data to the point where you can work with it is a huge amount of work. If you’re trying to reconcile a lot of sources of data that you don’t control like in this flight search example, it can take 80% of your time.

While there are projects underway to help automate the data cleaning process and reduce the time it takes, the task of automation is made difficult by the fact that the process is as much art as science, and no two data preparation tasks are the same. That's why flexible, high-level langauages like R are a key part of the process. As Mitchell Sanders notes in a Tech Republic article,

     Data science requires a difficult blend of domain knowledge, math and statistics expertise, and code hacking skills. In particular, he suggests that expert knowledge of tools like R and SAS are critical. "If you can't use the tools, you can't analyze the data."

This is a critical step to gaining any kind of insight from data, which is why data scientists still command premium salaries today, according to data from Indeed.com.

Tuesday, August 05, 2014

How to generate a color-coded/formatted Word file of R programs in RStudio

RMarkdown: Generate a color-coded/formatted Word file of R! using RStudio

Here is my approaches/cheat-sheet/template:

If have codes like these:
 
     library("xxx")    
     summary (usmort)
 
Method I (using RStudio)
  • Start a new R markdown file (File -> New File-> R Markdown).
  • Delete all the codes that RStudio generated in that new file.
  • Copy and paste the R codes above into the new file
  • Add global option (between --- and ---), chunk head (```) with option (eval=FALSE), and chunk tail (```) :
    ---
    output:
    word_document:
    highlight: pygments
    ---
    ```{r,eval=FALSE}
                 ... your codes here ...
    ``` 
  • Click the ‘knit word’ button, you get a Word file with codes show like below: 
     library("xxx")
     summary
(usmort)
 
Method II (with or without using RStudio)
  • add chunk head (```{r,eval=FALSE}) and tail (``` ) and wrap the whole codes, and save the file ("xxx_r_code.R").
  • run the R commands in console window:
       >library(rmarkdown)
    >render("xxx_r_codes.R",word_document(highlight="pygments"))

Thursday, June 19, 2014

Poison Ivy

Poison Ivy
 
Diagnose Poison Ivy Rash
Treat Poison Ivy Rash
Identify Poison Ivy
 
Remove Poison Ivy
 
 

Wednesday, April 23, 2014

Simpson's Paradox

Simpson's Paradox

Simpson's paradox (or the Yule-Simpson effect) is an unexpected result in which a association/correlation of  x and y in different groups disappears or reverses when the groups are combined.

Tuesday, April 15, 2014

uncertainty

When scientist and policy maker plus uncertainty

  • Schmitt (2014). "The Pitfall of Uncertainty" (The Scientist)
    • The Scientist published an opinion article about what uncertainty means to scientisit and policy maker. "...they must be crystal clear: science is about discovery and (decreasing) uncertainty, policymaking is about achieving consensus (if not certainty). Together, scientists and policymakers alike must strive to make responsible decisions for the benefit of society."
  • NPR (2017). Alan Alda's Experiment: Helping Scientists Learn To Talk To The Rest Of Us
  • Wikipedia:
    • A intuitive figure of true vs truth of Starecat.com: This is TRUE, this is TRUE, this is TRUTH. I like this figure indeed. However, 'True' is a an adjective; 'truth' is a noun. I would like to change this into "This is TURE, this is TURE, this is THE THUTH" or "This is a FACT, this is a FACT, this is THE TRUTH"
    


Thursday, April 03, 2014

Proposed Revisions to the Common Rule

Proposed Revisions to the Common Rule for the Protection of Human Subjects in the Behavioral and Social Sciences

It examines how to update human subjects protections regulations so that they effectively respond to current research contexts and methods. With a specific focus on social and behavioral sciences, this consensus report aims to address the dramatic alterations in the research landscapes that institutional review boards (IRBs) have come to inhabit during the past 40 years. The report aims to balance respect for the individual persons whose consent to participate makes research possible and respect for the social benefits that productive research communities make possible... (full text)

Friday, February 28, 2014

Unconventional view of type 2 diabetes causation proposed

Unconventional view of type 2 diabetes causation proposed
Source: MedicalPress

At 85, Nobel laureate James D. Watson, the co-discoverer of the double-helix structure of DNA, continues to advance intriguing scientific ideas. His latest, a hypothesis on the causation of type 2 diabetes, is to appear 7 pm Thursday US time in the online pages of The Lancet, the prestigious British medical journal.
 
Watson's hypothesis suggests that diabetes, dementias, cardiovascular disease, and some cancers are linked to a failure to generate sufficient biological oxidants, called reactive oxygen species (ROS). Watson also argues the case for a better understanding of the role of exercise in helping to remedy this deficiency. ...
 

Thursday, February 20, 2014

Tips - EndNote

EndNote

Tutorial
Entering and Editing Reference
  • When entering initials instead of full names, be sure to type a period (.) or a space between initials:
    Fisher, J.O.
    O Fisher
  • When entering corporate authors, put a comma (,) after the name:
    U.S. Department of Agriculture,
    Apple Computer Inc.,

Bibliographic Styles (Output Style)
  • EndNote Reference Style Files
  • How to set Hanging Indent
    • Helps of Endnote about "Hanging indent" options is not so helpful. The options (from a dropdown) is hidden in the right-low corner after you go through the menu "Edit - > Output Styles -> Bibliography -> Layout". It can be missed easily. The options include: None, All Paragraphs, First Paragraph Only, Second Paragraph Only, and All Paragraphs but the First.
    • If you are creating a numbered bibliography, you need insert a tab after the bibliography number to have the references line up correctly. (You need use "Insert Field" of a dropdown in the up-right corner to insert a tab)
    • In the MS Word, the Tab size (even the hanging indent) can be changed by select all the bibliography and slide the ticks of ruler.
  • How to change line space of your bibliography (I hate to say that this is not a logic/intuitive way)
    • In EndNote (please don't be confused the 'Cite While You Write' in MS Word), go to Tools>Cite While You Write>Format Bibliography, then the EndNote will open a Window 'Configure Bibliography' of MS Word, click 'Layout' tab, you can change the line spacing there.
    • Or, in the MS Word document, select the 'Endnote' tab, clicking the small arrow at the corner of the 'Bibliography' block, then click 'Layout' tab. It may take minutes to active it, then you change the bibliography format there.
    • The solution provided by the EndNote website is not so clear.
  • Special Formatting Characters for Templates:
    • o Ctrl+Alt+Space (non-breaking space or from the Insert field list): linking adjacent text, e.g. Edition◊ed.
    • | (vertical bar): forcing separation, e.g. Author. ·"Title."·Journal Volume|.Issue|·(Year)|:· Pages|
    • `xxx`(on the same key as the tilde (~)):forcing EndNote to interpret a word between them as just text and not as a field name, e.g. (Editor) vs. (`Editor`)
    • ^ (caret): the lable before the pages should appear in plural or singular form depending on the number of pages, e.g. p^pp◊Pages.
  • How to Get Endnote to Abbreviate Journal Names: Journal Term List.
  • How to reset EndNote Web library
  • More FAQs of The University of Toledo
  • When convert the X9 library into 20 library, if you lost files, try: The proper process for recovering library files
When you online search PubMed in the EndNote, the record includes the URL for PubMed articles. However, quite annoying, when the PubMed.com outputs a file for citation manager, there is no URL information. You need highlight the EndNote records in your library and go to "References" menu => "Find Reference Update".