- 2021
- Cole (2020): Surprise! (P value vs. S value)
- KDnuggets by Bills (2021): Null Hypothesis Significance Testing is Still Useful
- 2019:Another wave beyond the P-value:
- Greenland (2019): Valid p-values behave exactly as they should: Some misleading criticisms of p-values and their resolution with s-values.
- Benjamin (2019): Three recommendations for improving the use of p-values
- Blume (2019): An introduction to second-generation p-values
- Tarran (2019): Is this the end of "statistical significance"?
- American Statistician (2019): A few articles relate to the p-value and alternative
- American Statistician (2019 Supl.): Statistical Inference in the 21st Century: A World Beyond p < 0.05
- Wasserstein (2019): Moving to a World Beyond “p < 0.05”
- McShane (2019): Abandon Statistical Significance
- Amrhein (2019): Comment of Nature: Scientists rise up against statistical significance,
- Editorial of Nature (2019): It’s time to talk about ditching statistical significance
- Different opinions
- Adams (2019): a trillion P values and counting
- Zhang (2019): P values akin to ‘beyond reasonable doubt’
- 2017: Robert Matthews published an article about the changes of statistical practice after ASA's statement on the statistical significance and p-values: The ASA's p-value statement, one year on. I agree on one highlight in this article: "It should be possible to establish firm general principles which focus on what is right rather than what is wrong"
- Abstract:Its aim was to stop the misuse of statistical significance testing. But Robert Matthews argues that little has changed in the 12 months since the ASA's intervention.
- Why do people use p-values instead of computing probability of the model given data?
- 2016: ASA releases statement on statistical significance and p-values (03/07/2016). The statement's six principles:
- P-values can indicate how incompatible the data are with a specified statistical model.
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
- Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
- Proper inference requires full reporting and transparency.
- A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
- p-value is the likelihood of null hypothesis (H0), a conditional probability given H0:
- p(X ≥ x|H0) for a right tail event
- p(X ≤ x|H0) for a left tail event
- 2 * min{P(X ≥ x|H0), p(X ≤ x|H0)} for a 2-tail event
- α level, the level of significance, is a pre-defined probability of falsely rejecting the null hypothesis we accept. Typically, the statistical significance means the p-value < α level at 0.05.
- False discovery rate (Wikipedia)
- Many times, the p-value was incorrectly interpreted as one of posterior probabilities given the observed data, probabilities given the observed data, the false discovery rate (Colquhoun, 2014) or false positive rate.
- Storey (2003): The Positive False Discovery Rate: A Bayesian Interpretation and the q-Value
- Type I error (α) = 1 - specificity, Type II error (β) = 1 - sensitivity, Power = sensitivity = 1 - β
- Leek (2017). Five ways to fix statistics (Nature)
- Benjamin (2017). Redefine statistical significance
- Baker (2016). Statisticians issue warning over misuse of P values (pdf)
- Matloff (2016). 1) After 150 years, the ASA says no to p-value, 2) Further comments on the ASA manifesto, 3) P-values: the continuing saga
- Benjamini & Galili (2016). It's not the p-values' fault reflections on the recent ASA statement
- Kass (2016). Ten Simple Rules for Effective Statistical Practice. It acknowledged three websites:
- xkcd.com "for conveying statistical ideas with humor"
- Simply Statistics "for thoughtful commentary"
- FiveThirtyEight "for bringing statistics to the world (or at least to the media)".
- Aschwanden (2016). Science Isn’t Broken - It’s just a hell of a lot harder than we give it credit for
- Halsey (2015). The fickle P value generates irreproducible results
- Lazzeroni (2016). Solutions for quantifying P value uncertainty and replication power, and response of Halsey;
- Nuzzo (2013). Scientific method: Statistical errors
- Epimonitor (2016). Growing Concern About Statistical Errors Triggers Statement of P-Values
- Youngquist (2012). Part 19: What is a P value;
- GraphPad's Advice: how to interpret a small P value
- Frost (2014). How to Correctly Interpret P Values, Five Guidelines for Using P values
- Held (2010). A nomogram for P values
- Cumming (2012). Mind your confidence interval: how statistics skew research results
- Gumming (2013). The problem with p values: how significant are they, really?
- Ioannidis (2015). Why most published research findings are false
- Ranstam (2012). Why the P value culture is bad and confidence intervals a better alternative. Ranstam (2009) Sampling uncertainty in medical research. Austin (2002). A brief note on overlapping confidence intervals
- Aschwanden. Statisticians found one thing they can agree on: it's time to stop misusing P-values.
- Lew (2013). Give p a chance: significance testing is misunderstood
- Sullivan (2012). Using Effect Size—or Why the P Value Is Not Enough.
- Fraser (2016). Crisis in Science? or Crisis in Statistics! Mixed messages in Statistics with impact on Science
- Gelman (2014). Data-dependent analysis—a “garden of forking paths”—
- Capital of Statistics. 美国统计协会开始正式吐槽(错用)P值啦
- Wikipedia. Type I & II errors, sensitivity & specificity, effect size, Bayes factor.
- Hubers (2013). Measures of effect size in Stata 13
- Robert Coe (2002). It's the Effect Size, Stupid
- Andrew Gelman (2013). P value and statistical practice, Misunderstanding the p-value
- Simonsohn (2013). Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone
- Goodman (2001). Of P-values and Bayes: a modest proposal
- Goodman (1999). Toward evidence-based medical statistics: the P value fallacy and The Bayes factor (notes: I like the Kass's formula, which uses the likelihood of alternative hypothesis as a numerator, and gives a BF without many decimals).
- Kass (1995). Bayes factors [on the basis of observed data D, for the dichotomous conditions / models / hypotheses (H1 or H0), Bayes factor = p(D|H1)/p(D|H0)], the rules of thumb assess the quality of the evidence favoring one hypothesis over another as a reference:
- 1 to 3 (not worth more than a bare mention)
- 3 to 20 (positive)
- 20 to 150 (strong)
- > 150 (very strong)
- The odds form of Bayes's theorem for two hypotheses is convenient for calculating a Bayesian update of a chance.
- If there are mutually exclusive hypotheses H1 (= Alternative hypothesis, H1 = Disease) and H0 (= Null hypothesis = No disease) by given D (= Observed data/evidence = Postive test),
- p(H1 and D) = p(H1|D)*p(D) = p(D|H1)*p(H1)
- p(H1|D) = p(H1)*p(D|H1)/p(D)
- p(H0|D) = p(H0)*p(D|H0)/p(D)
- p(H1|D)/p(H0|D) = p(H1)/p(H0)*p(D|H1)/p(D|H0)
- p(H1|D)/[1-p(H1|D)] = p(H1)/[1-p(H1)]*p(D|H1)/p(D|H0)
- odds(H1|D) = odds(H1)*p(D|H1)/p(D|H0)
- Bayes factor (BF) = p(D|H1)/p(D|H0)
- Prior odds of H1 = odds(H1) = p(H1)/p(H0) = p(H1)/[1-p(H1)]
- Posterior odds of H1 given data = odds(H1|D) = p(H1|D)/p(H0|D)= (prior odds)*(BF or likelihood ratio)
- p(H1)=odds(H1)/[1+odds(H1)], p(H1|D)=odds(H1|D)/[1+odds(H1|D)]
- Suppose a person with disease had 3/4 possiblity of positive test, and a person without disease had 1/5 possibility of positive test. and we have no idea of p(D) of that person (the population), which means the prior proability is 50% and the prior odds(D) = p(D) / (1 - p(D) = 1:1. When a person had a positive test:
- the posterior odds (odds(D|Pos)) = (1:1) * (3/4 / 1/5) = 15/4 = 3.75 = (p(D) / (1-p(D)) * (p(Pos|D) / (1 - p(Pos|D)) = odds(D) * BF
- or, the probability had a disease given a positive test (p(D|Pos)) = ((3/4) * 0.5) / (0.5 * 3/4 + (1 - 0.5) * 1/5) = 15/19 = odds(D|Pos) / (1 + odds(D|Pos)) = .79
- Eight versions of Bayes's Theorem (pdf): simple, explicity, general, Sigma, canceled, odds, relative odds, and compound odds.
- Currell (2009). Chapter 7 Bayesian statistics
- O’Hagan (2006). Bayes factors, Deeks (2004). Diagnostic tests 4: likelihood ratios, Lindley (2004). Bayesian thoughts, Griffis (2006). Statistics and the Bayesian mind
- Berger (1988). The likelihood principle: a review, generalizations, and statistical implications
- Masson (2011): A tutorial on a practical Bayesian alternative to null-hypothesis significance testing
- Faulkenberry (2018): A Simple Method for Teaching Bayesian Hypothesis Testing in the Brain and Behavioral Sciences
- Sellke (2001): Calibration of p Values for Testing Precise Null Hypotheses
- Berger (1987): Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence
- Bayesian p-value
- Nowozin (2015). Bayesian P-Values
- Lin (2009). Using Bayesian p-values in a 2 x 2 table of matched pairs with incompletely classified data
- NISS Webinar (2019): Alternatives to the Traditional P-value
Disclaimer: This blog site is intended solely for sharing of information. Comments are warmly welcome, but I make no warranties regarding the quality, content, completeness, suitability, adequacy, sequence, or accuracy of the information.
Wednesday, May 15, 2019
effect size, P value, and Bayes odds
P value, effect size, and Bayes factor, and after one year
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment