In February and March this year, I stayed at the Eindhoven Technical University in the amazing group with Daniël Lakens, Anne Scheel and Peder Isager, who are actively researching questions of replicability in psychological science. Over the two months I have learned a lot, exchanged some great ideas with the three of them – and was able to work together with Daniël on a small overview article.
In the context of problems with replicability in psychology and other empirical fields, statistical significance testing and p-values have received a lot of criticism. And without question: much of the criticism has its merits. There certainly are problems with how significance tests are used and p-values are interpreted.1
However, when we are talking about “p-hacking”, I feel that the blame is unfairly on p-values and significance testing alone without acknowledging the general consequences of such behaviour in the analysis.2 In short: selective reporting of measures and cases3 invalidates any statistical method for inference. When I only selectively report variables and studies, it doesn’t matter whether I use p-values or Bayes factors — both results will be useless in practice.
Recently, I had the opportunity to give a lecture on Bayesian statistics to a semester of Psychology Master students at the University of Bonn. The slides, which are in German, I’d like to share here for interested readers.
This is an interesting article from The Guardian on “post-truth” politics, where statistics and “experts” are frowned upon by some groups. William Davies shows how statistics in the political debate have evolved from the 17th century until today, where statistics are not regarded as an objective approach to reality anymore but as an arrogant and elitist tool to dismiss individual experiences. What comes next, however, is not the rule of emotions and subjective experience, but privatised data and data analytics that are only available to few anonymous analysts in private corporations. This allows populist politicians to buy valuable insight without any accountability, exactly what Trump and Cambridge Analytica did. The article makes a point how this is troublesome for liberal, representative democracies.
My last blog post was on the difference between Sensitivity, Specificity and the Positive Predictive Value. While showing that a positive test result can represent a low probability of actually having a trait or a disease, this example used the values of Sensitivity and Specificity as pre-known input. For established tests and measures they indeed are often available in literature together with recommended cut-off values.1
In this post, I would like to show how the choice of a cut-off value influences quality criteria such as Sensitivity, Specificity and the like. If you just want a tool to play with, see my Shiny web application here.
A lot of debate (and part of my thesis) revolve around replicability and the proper use of inferential methods. The American Statistical Association has now published a statement on the use and the interpretation of p-Values (freely available, yay). It includes six principles and how to handle p-Values. None of them are new in a theoretical sense. It is more a symbolic act to remind scientists to properly use and interpret p-values.
Bad enough, that we have to read und hear current failures of thought by right wing populists (article in German only) and many relativizations (comments in German only). It seems like 70 years of History class did not help to stop utter racism in public debate.
What, however, sparked my interest was the question what correlations with birth rates there are. My intuitive expectation was, that higher life expectancy is linked to lower birth rates, what might also be explained from an evolutionary perspective.
Now, I’m neither an anthropologist nor do I know the current state of research and I can only use openly available statistics. Luckily, the World Bank has a large database with various indicators for all countries and regions of the world.
This thread on StackExchange is circling around my Twitter timeline today and I couldn’t resist sharing it here:
Suppose we have data set (X_i, Y_i) with n points. We want to perform a linear regression, but first we sort the X_i values and the Y_i values independently of each other, forming data set(X_i, Y_j). Is there any meaningful interpretation of the regression on the new data set? Does this have a name?
I don’t want to blame the author of the question. It just offers plain ignorance of basic statistical concepts. On first sight this might be a beginner’s misunderstanding, but this totally kills it:
But my manager says he gets “better regressions most of the time” when he does this […]. I have a feeling he is deceiving himself.
This isn’t incompetence anymore – this is deliberate torture of statistics.