Statistical evidence in experimental psychology: an empirical comparison using 855 t tests

R. Wetzels; D. Matzke; M.D. Lee; J.N. Rouder; G.J. Iverson; E.-J. Wagenmakers

doi:https://doi.org/10.1177/1745691611406923

Statistical evidence in experimental psychology: an empirical comparison using 855 t tests

Authors	R. Wetzels D. Matzke M.D. Lee J.N. Rouder G.J. Iverson E.-J. Wagenmakers
Publication date	2011
Journal	Perspectives on Psychological Science
Volume \| Issue number	6 \| 3
Pages (from-to)	291-298
Organisations	Faculty of Social and Behavioural Sciences (FMG) - Psychology Research Institute (PsyRes)
Abstract	Statistical inference in psychology has traditionally relied heavily on p-value significance testing. This approach to drawing conclusions from data, however, has been widely criticized, and two types of remedies have been advocated. The first proposal is to supplement p values with complementary measures of evidence, such as effect sizes. The second is to replace inference with Bayesian measures of evidence, such as the Bayes factor. The authors provide a practical comparison of p values, effect sizes, and default Bayes factors as measures of statistical evidence, using 855 recently published t tests in psychology. The comparison yields two main results. First, although p values and default Bayes factors almost always agree about what hypothesis is better supported by the data, the measures often disagree about the strength of this support; for 70% of the data sets for which the p value falls between .01 and .05, the default Bayes factor indicates that the evidence is only anecdotal. Second, effect sizes can provide additional evidence to p values and default Bayes factors. The authors conclude that the Bayesian approach is comparatively prudent, preventing researchers from overestimating the evidence in favor of an effect.
Document type	Article
Language	English
Published at	https://doi.org/10.1177/1745691611406923
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Statistical evidence in experimental psychology: an empirical comparison using 855 t tests