How statistics lost their power – and why we should fear what comes next

This is an interesting article from The Guardian on “post-truth” politics, where statistics and “experts” are frowned upon by some groups. William Davies shows how statistics in the political debate have evolved from the 17th century until today, where statistics are not regarded as an objective approach to reality anymore but as an arrogant and elitist tool to dismiss individual experiences. What comes next, however, is not the rule of emotions and subjective experience, but privatised data and data analytics that are only available to few anonymous analysts in private corporations. This allows populist politicians to buy valuable insight without any accountability, exactly what Trump and Cambridge Analytica did. The article makes a point how this is troublesome for liberal, representative democracies.


Choosing Cut-Offs in Tests

My last blog post was on the difference between Sensitivity, Specificity and the Positive Predictive Value. While showing that a positive test result can represent a low probability of actually having a trait or a disease, this example used the values of Sensitivity and Specificity as pre-known input. For established tests and measures they indeed are often available in literature together with recommended cut-off values.1

In this post, I would like to show how the choice of a cut-off value influences quality criteria such as Sensitivity, Specificity and the like. If you just want a tool to play with, see my Shiny web application here.

Continue reading Choosing Cut-Offs in Tests

ASA statement on p-Values: Improving valid statistical reasoning

A lot of debate (and part of my thesis) revolve around replicability and the proper use of inferential methods. The American Statistical Association has now published a statement on the use and the interpretation of p-Values (freely available, yay). It includes six principles and how to handle p-Values. None of them are new in a theoretical sense. It is more a symbolic act to remind scientists to properly use and interpret p-values.

Continue reading ASA statement on p-Values: Improving valid statistical reasoning

Birth Rates and Life Expectancy

Bad enough, that we have to read und hear current failures of thought by right wing populists (article in German only) and many relativizations (comments in German only). It seems like 70 years of History class did not help to stop utter racism in public debate.

What, however, sparked my interest was the question what correlations with birth rates there are. My intuitive expectation was, that higher life expectancy is linked to lower birth rates, what might also be explained from an evolutionary perspective.
Now, I’m neither an anthropologist nor do I know the current state of research and I can only use openly available statistics. Luckily, the World Bank has a large database with various indicators for all countries and regions of the world.

Continue reading Birth Rates and Life Expectancy

Sorting Data independently before Regression

This thread on StackExchange is circling around my Twitter timeline today and I couldn’t resist sharing it here:

Suppose we have data set (X_i, Y_i) with n points. We want to perform a linear regression, but first we sort the X_i values and the Y_i values independently of each other, forming data set(X_i, Y_j). Is there any meaningful interpretation of the regression on the new data set? Does this have a name?

I don’t want to blame the author of the question. It just offers plain ignorance of basic statistical concepts. On first sight this might be a beginner’s misunderstanding, but this totally kills it:

But my manager says he gets “better regressions most of the time” when he does this […]. I have a feeling he is deceiving himself.

This isn’t incompetence anymore – this is deliberate torture of statistics.