Last weekend, I gave a 1.5 day workshop for students at my university on data analysis using R. In this post I briefly share my experience along with the workshop slides and an example project – both of which are in German. If you are looking for an English introduction into R, have a look at Hadley Wickham’s excellent “R 4 Data Science”, which you can find here.
In unserem Psychologie-Studiengang wird, wie an vielen andere Unis auch, der Umgang mit SPSS gelehrt. Dabei liegt der Fokus im Wesentlichen auf der Anwendung der gängigen Hypothesentests über die Menüs. R wurde bisher nur mal am Rande erwähnt – als Alternative wenn die Fragestellungen etwas anspruchsvoller werden. Im Rahmen der Open Science-Diskussionen ist R aber auch zu einem wichtigen Baustein geworden, wenn es um reproduzierbare Analysen und Nutzung freier Software geht. Continue reading “Workshop “Einführung in die Datenanalyse mit R” (Post and Slides in German)”
In December I already blogged about the ReplicationBF package, I made available on GitHub. It allows you to calculate Replication Bayes Factors for t- and F-tests. The preprint detailing the formulas for the latter was outdated and the method in the package was not optimal, so I recently updated both.
Continue reading “Update on the Replication Bayes Factor”
Another presentation I gave at the General Online Research (GOR) conference in March, was on our first approach to using topic modelling at SKOPOS: How can we extract valuable information from survey responses to open-ended questions automatically? Unsupervised learning is a very interesting approach to this question — but very hard to do right.
Continue reading “Using Topic Modelling to learn from Open Questions in Surveys”
At the GOR conference in Cologne two weeks ago, I had the opportunity to give a talk on replicability in Online Research. As a PhD student researching this topic and working as a data scientist in market research, I was very happy to have the opportunity to give my thoughts on how the debate in psychological science might transfer to online and market research.
The GOR conference is quite unique since the audience is about half academics and half commercial practitioners from market research. I noticed my filter bubble, when only about a third of the audience knew about the “replicability crisis in psychology” (Pashler & Wagenmakers, 2012; Pashler & Harris, 2012).
Continue reading “Replicability in Online Research”
In a recent post, I mentioned a replication study we performed. We have now finalised the manuscript and uploaded it as a pre-print to PsyArXiv.
Update (25.04.2018): The paper is now published at Royal Society Open Science and available here.
Continue reading “New Preprint: Does it Actually Feel Right?”
In the context of problems with replicability in psychology and other empirical fields, statistical significance testing and p-values have received a lot of criticism. And without question: much of the criticism has its merits. There certainly are problems with how significance tests are used and p-values are interpreted.
However, when we are talking about “p-hacking”, I feel that the blame is unfairly on p-values and significance testing alone without acknowledging the general consequences of such behaviour in the analysis. In short: selective reporting of measures and cases invalidates any statistical method for inference. When I only selectively report variables and studies, it doesn’t matter whether I use p-values or Bayes factors — both results will be useless in practice. Continue reading “p-hacking destroys everything (not only p-values)”
I rarely read pop-sci books, and I even more rarely review books in any form. However, I bought „Everybody Lies“ some months ago and just finished reading it. It took me about four months to read it, partly because it made me so angry as a researcher reading it. Continue reading “Book Review: Everybody Lies”
Recently, I had the opportunity to give a lecture on Bayesian statistics to a semester of Psychology Master students at the University of Bonn. The slides, which are in German, I’d like to share here for interested readers. Continue reading “Introduction to Bayesian Statistics (Slides in German)”
Some months ago I’ve written a manuscript how to calculate Replication Bayes factors for replication studies involving F-tests as is usually the case for ANOVA-type studies.
After a first round of peer review, I have revised the manuscript and updated all the R scripts. I have a written a small R-Package to have all functions in a single package. You can find the package at my GitHub repository. Thanks to devtools and Roxygen2, the documentation should contain the most important information on how to use the functions. Reading the original paper and my extension should help clarifying the underlying considerations and how to apply the RBF in a given situation.
I will update the preprint at arXiv soon too and add some more theoretical notes here on the blog about my perspective on the use of Bayes factors. In the meantime you might as well be interested in Ly et al.’s updated approach to the Replication Bayes factor, which is not yet covered in either my manuscript nor the R-package.
Please post bugs and problems with the R package to the issue tracker at GitHub.
This year, the BVM (German professional association for market and social researchers), hosted their first Data Science Cup. There were four tasks involving the prediction of sales data for the online sci-fi game “EVE Online”.
It was my first year working in market research and applying statistics and machine learning algorithms in a real-world context. So, naturally there is much room for improvements to my solution, but I ranked 3rd out of five, so I’m right in the middle. I would do many things differently today, but that’s how it’s supposed to be, right? For example, I would go through with a multilevel model, since the data has a natural hierarchy, that should be incorporated into the analysis.
I have uploaded my solution to a GitHub repository, so you might learn from my mistakes. In the README I have also included some of my reasoning and some technical details. But beware, the code is messy and badly documented – proceed with caution.