Another presentation I gave at the General Online Research (GOR) conference in March1, was on our first approach to using topic modelling at SKOPOS: How can we extract valuable information from survey responses to open-ended questions automatically? Unsupervised learning is a very interesting approach to this question — but very hard to do right.
This year, the BVM (German professional association for market and social researchers), hosted their first Data Science Cup. There were four tasks involving the prediction of sales data for the online sci-fi game “EVE Online”.
It was my first year working in market research and applying statistics and machine learning algorithms in a real-world context. So, naturally there is much room for improvements to my solution, but I ranked 3rd out of five, so I’m right in the middle. I would do many things differently today, but that’s how it’s supposed to be, right? For example, I would go through with a multilevel model, since the data has a natural hierarchy, that should be incorporated into the analysis.
I have uploaded my solution to a GitHub repository, so you might learn from my mistakes. In the README I have also included some of my reasoning and some technical details. But beware, the code is messy and badly documented – proceed with caution.