ASA statement on p-Values: Improving valid statistical reasoning

A lot of debate (and part of my thesis) revolve around replicability and the proper use of inferential methods. The American Statistical Association has now published a statement on the use and the interpretation of p-Values (freely available, yay). It includes six principles and how to handle p-Values. None of them are new in a theoretical sense. It is more a symbolic act to remind scientists to properly use and interpret p-values.

1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

As you can read in the full statement, what I recommend, p-values are not disregarded completely.¹ Rather, the ASA underlines the meaning of p-values:

Informally, a p-value is the probability under a specified statistical model that a statistical summary of the data (for example, the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.

Some researchers, however, would like to see p-values banned and replaced by something else, such as Bayes factors, confidence intervals, effect size estimates, …

In my opinion, the very last sentence of the statement makes it clear that every alternative to p-values have to be interpreted with similar care:

No single index should substitute for scientific reasoning.

As others have pointed out ², scientific reasoning is more than interpreting statistical figures: The reasoning of a particular finding also depends on the study design, the measurement and validity of the underlying theory and the like.

A p-value does present information about the data in the light of a hypothesis. So do Bayes factors, confidence intervals and effect size estimates. The interpretation is different, though, for each and one statistic might have advantages and disadvantages when compared to the other. It is important for the process of reasoning to choose the statistic that corresponds to the logical argument presented – and this is where many publications have failed and what skews interpretations and the (expected) replicability of studies.

If I want to quantify the evidence in my data for or against a hypothesis, Bayes factors might be a better statistic to choose than a p-value. The correct interpretation is – in the first step – the most important part.

The statement is definitely worth reading and you should do so if you happen to do statistical inferences in any kind.

If you do not want to read the original statement, at least read Alexander Etz’ summary of it. ↩
E.g. Andrew Gelman in his comment on the statement; you can find it – and more comments from other researchers – also in the supplementals. ↩

neurotroph

ASA statement on p-Values: Improving valid statistical reasoning

Leave a Reply Cancel reply