Replicability in Online Research

At the GOR conference in Cologne two weeks ago, I had the opportunity to give a talk on replicability in Online Research. As a PhD student researching this topic and working as a data scientist in market research, I was very happy to have the opportunity to give my thoughts on how the debate in psychological science might transfer to online and market research.

The GOR conference is quite unique since the audience is about half academics and half commercial practitioners from market research. I noticed my filter bubble, when only about a third of the audience knew about the “replicability crisis in psychology” (Pashler & Wagenmakers, 2012; Pashler & Harris, 2012).

? You can download the slides here.
Below you find a written summary of some of the thoughts from the presentation.

Replication Crisis in Psychology

In just two sentences: in psychological science, several teams were unable to successfully replicate previous findings. The largest project was the “Reproducibility Project: Psychology” (Open Science Collaboration, 2015), but there have also been the ManyLabs projects and many meta-analyses in recent years.

In general: the rate of successful replications is alarmingly low (and the number of relevant published direct replications as well).

The problem exists also in other disciplines (economics, biomedicine, cancer research, …), but — it might be the filter bubble again — psychology sometimes feels like the most active in trying to overcome this “crisis”.

Online and commercial Market Research

So, what about Online Research? Online research as an academic field and as the basis for applied market research, shares theoretical and methodological foundations with psychological and sociological research.

Moreover, commercial market research currently faces an important debate on the quality of its work — not only because of the recent attention to fraud cases in the industry.¹ I believe, the only reasonable thing to do, thus, is to learn from the debate and developments in psychological science in general and social-psychology in particular.

But what does replicability even mean in a research setting that is ad hoc by nature and where we expect longitudinal effects (e.g. changes in brand images through advertisements campaigns)? Imagine a company giving the exact same project requirements to two different research institutes. Both of them will be doing a great job and have a perfectly justified research plan – but will they come to the same conclusion leading the company to the same decisions?²

A 15-minute talk is very little time to actually go into detail on the causes of the problems and the recommendations. So, I gave only a brief overview as a starting point for further discussions.

One problem with many studies especially in social-psychology has been notoriously small sample sizes: Samples of 30 to 50 WEIRD³ participants across two or three between-subjects conditions – resulting in effect size estimates beyond d = .8.
Luckily, this is one issue that is less prevalent in online research (in general and in market research in particular), because large, heterogeneous, representative, and even cross-cultural samples are quite easily available and common practice.

Recommendations

Which recommendations can be applied to online research, then? For academic online research, I think the most relevant approaches are Pre-registrations or Registered Reports (Chambers et al., 2015) and Open Science principles such as Open Data and Open Materials. There will be cases where neither will be possible, but this is also true in psychological science (think e.g. about organisational psychology where sample confidentiality is very important).

For commercial market research, though, Open Data and Open Material will more often be impossible than not. On the other hand, the incentive structure is completely different to academia: As market researchers have an obligation to the customer, the problem of publication bias and the “file drawer” is much less relevant.

There are still many degrees of freedom in designing a study and carrying out the analysis — another major concern in this context. This can be reduced by not only laying out the study design in the proposal to the client, but also the data analysis more precisely (effectively a pre-registration).

One might say, that market research is a lot less theory-driven and construct-oriented as scientific research — but I might counter, that most of psychology is also a lot less theory-driven than many consider it. Our theories are often weak, vague and cannot make predictions for future experiments. And “customer satisfaction” on the other hand might also be a more complex construct than a single item in a questionnaire. Meaning, that we should take more care about measurement error and proper statistical analysis in both psychology and online research.

Statistical analysis in market research is primarily exploratory, but p-values are still all over the place and similarly misused and misinterpreted as in science. Especially in an applied setting, effect sizes might be more informative and can be translated into units relevant to the customer.

These are just some recommendations (some more are at least mentioned in the slides), but I hope that we can start to have discussion how to implement steps to increase the reliability of market research. This is directly related to the quality of market research and we can learn a lot from the ongoing debates in science.

Concluding Comments

Do we even have a crisis of replicability? Gilbert et al. (2016) criticised the RP:P on different grounds and is often used to present a counterargument to the need for more openness and replications in science. It is important to also consider the response by Anderson et al. (2016) as it highlights several misconceptions in the Gilbert et al. (2016) comment. The statistical considerations have also been debunked e.g. by Daniel Lakens.

The RP:P is certainly not unequivocal evidence that our field is doomed. Considering the RP:P, the Many Labs projects, several meta-analyses and projects such as Curate Science, however, shows quite clearly that we do have a problem in science. This is not only due to publication bias and also cannot explained only through hidden moderators.

I agree very much with a comment by someone from the audience, that replication studies need to adhere to the same scientific principles and standards replicators lay on originally published research. Unfortunately, this has, in my view, led to a “tone debate” and distracted from the issues in research practices that need fixing.⁴

One last thing, that I feel is important: Data Science and Machine Learning are highly relevant to the commercial market research. It will change and improve our work. But it is no holy grail and will not fix the issues raised here – in fact, mindlessly applying Neural Networks to each and any question or problem in market research will even worsen the problem. This might as well fill a whole other blog post…

References

Anderson, C. J., Bahnik, t pan, Barnett-Cowan, M., Bosco, F. A., Chandler, J., Chartier, C. R., … Zuni, K. (2016). Response to Comment on “Estimating the reproducibility of psychological science.” Science, 351(6277), 1037–1037. http://doi.org/10.1126/science.aad9163
Chambers, C. D., Dienes, Z., McIntosh, R. D., Rotshtein, P., & Willmes, K. (2015). Registered Reports: Realigning incentives in scientific publishing. Cortex, 66, A1–A2. http://doi.org/10.1016/j.cortex.2015.03.022
Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on “Estimating the reproducibility of psychological science.” Science, 351(6277), 1037–1037. http://doi.org/10.1126/science.aad7243
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2–3), 61–83. http://doi.org/10.1017/S0140525X0999152X
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716-aac4716. http://doi.org/10.1126/science.aac4716
Pashler, H., & Harris, C. R. (2012). Is the Replicability Crisis Overblown? Three Arguments Examined. Perspectives on Psychological Science, 7(6), 531–536. http://doi.org/10.1177/1745691612463401
Pashler, H., & Wagenmakers, E. (2012). Editors’ Introduction to the Special Section on Replicability in Psychological Science. Perspectives on Psychological Science, 7(6), 528–530. http://doi.org/10.1177/1745691612465253

As an interesting observation on the side: The debate on replicability in psychology is closely linked to cases of scientific misconduct and data manipulation. I don’t think this is a coincidence, even if a lack of replicability is only partly due to actual fraud. ↩
At the presentation I was told that similar questions have been investigated in the past. Mostly with the result, that results differed. Unfortunately I currently cannot find the references to the projects mentioned. ↩
White, Educated, Industrialised, Rich, Democractic. See Henrich, Heine, & Norenzayan (2010). You might want to add “female” and “undergraduate psychology students” to the list. ↩
See e.g. the very bad article at The Boston Globe. ↩