In the context of problems with replicability in psychology and other empirical fields, statistical significance testing and *p*-values have received a lot of criticism. And without question: much of the criticism has its merits. There certainly are problems with how significance tests are used and *p*-values are interpreted.^{1}

However, when we are talking about “*p*-hacking”, I feel that the blame is unfairly on *p*-values and significance testing alone without acknowledging the general consequences of such behaviour in the analysis.^{2} In short: selective reporting of measures and cases^{3} invalidates any statistical method for inference. When I only selectively report variables and studies, it doesn’t matter whether I use *p*-values or Bayes factors — both results will be useless in practice. Continue reading “p-hacking destroys everything (not only p-values)”