In September, my contract as a research assistant at the University of Bonn ended. I was lucky to have a 50% contract for three years and even more lucky that I had the option to extend the contract for another year. Nevertheless, I will leave academia as I’m close to finishing by PhD thesis and planned to work outside of the university from the beginning.
Psychological science is one of the fields that is undergoing drastic changes in how we think about research, conduct studies and evaluate previous findings. Most notably, many studies from well-known researchers are under increased scrutiny. Recently, journalists and researchers have reviewed the Stanford Prison Experiment that is closely associated with the name of Philip Zimbardo. Many consider Zimbardo a “prestigious” psychologist. In the discussion about how we should think about doing science in times of the “replicability crisis”, the issue of “prestige” comes up in different forms. Among the recurring questions: Should we trust what prestigious researchers say? Should we put more faith in articles published in prestigious journals?
It seems as if some consider prestige to be something bad: Prestige is not earned through hard work but through capitalizing on past successes and putting oneself in the spotlight. They, in my experience, often criticize journals or conferences for inviting only prestigious authors or speakers. Others (knowingly or unknowingly) put a lot of trust into prestige, for example by accepting theories from prestigious sources more easily.
Daniël told me about this the other day: Our recent pre-print on informative ‘null effects’ is now cited in the submission criteria for Psychological Science in a paragraph on drawing inferences from ‘theoretically significant’ results that may not be ‘statistically significant’. I feel very honoured that the editorial board at PS considers our manuscript as a good reference. To me, this also shows the importance and usefulness of pre-prints: The manuscript is not yet published in a journal and is already well received through this blog and Twitter. Yay, future!
In February and March this year, I stayed at the Eindhoven Technical University in the amazing group with Daniël Lakens, Anne Scheel and Peder Isager, who are actively researching questions of replicability in psychological science. Over the two months I have learned a lot, exchanged some great ideas with the three of them – and was able to work together with Daniël on a small overview article.
In December I already blogged about the ReplicationBF package, I made available on GitHub. It allows you to calculate Replication Bayes Factors for t- and F-tests. The preprint detailing the formulas for the latter was outdated and the method in the package was not optimal, so I recently updated both.
Another presentation I gave at the General Online Research (GOR) conference in March1, was on our first approach to using topic modelling at SKOPOS: How can we extract valuable information from survey responses to open-ended questions automatically? Unsupervised learning is a very interesting approach to this question — but very hard to do right.
At the GOR conference in Cologne two weeks ago, I had the opportunity to give a talk on replicability in Online Research. As a PhD student researching this topic and working as a data scientist in market research, I was very happy to have the opportunity to give my thoughts on how the debate in psychological science might transfer to online and market research.
The GOR conference is quite unique since the audience is about half academics and half commercial practitioners from market research. I noticed my filter bubble, when only about a third of the audience knew about the “replicability crisis in psychology” (Pashler & Wagenmakers, 2012; Pashler & Harris, 2012).
In the context of problems with replicability in psychology and other empirical fields, statistical significance testing and p-values have received a lot of criticism. And without question: much of the criticism has its merits. There certainly are problems with how significance tests are used and p-values are interpreted.1
However, when we are talking about “p-hacking”, I feel that the blame is unfairly on p-values and significance testing alone without acknowledging the general consequences of such behaviour in the analysis.2 In short: selective reporting of measures and cases3 invalidates any statistical method for inference. When I only selectively report variables and studies, it doesn’t matter whether I use p-values or Bayes factors — both results will be useless in practice.