Australasian Mathematical Psychology Conference 2019

Response times and the exploration-exploitation trade-off

Deborah Lin
Melbourne School of Psychological Sciences, University of Melbourne
Daniel R. Little
Melbourne School of Psychological Sciences, University of Melbourne
Philip L. Smith
Melbourne School of Psychological Sciences, University of Melbourne

Often, decisions do not entail explicitly stated rewards and outcomes for each option but instead require learning about the decision environment from experience. Additionally, the decision environment can be stable or changing. These decisions can be formalised as a multi-armed bandit problem. In these bandit problems, humans have to navigate a trade-off between exploration (i.e., trying out different options) and exploitation (i.e., sticking with a familiar option) while continually tracking the decision environment in order to maximise reward over the entire period. Numerous cognitive models utilising partial to full information, and different underlying mechanisms, such as heuristics, Bayesian updating, and sequential sampling, have been developed to account for choice behaviour in these problems. At present, choice response times (RTs) have largely not been analysed or fit in the exploration-exploitation literature. This project aims to better constrain and discriminate between these models by using choices as well as RTs, and determine how effort, uncertainty, and reward influence explorative and exploitative behaviour. An experimental paradigm comprising a perceptual task instead of an economic decision task is used to ensure reliable RTs, and allow for fine-grained control over experimental parameters and participant goals. We present model simulations and demonstrate that this paradigm is a promising avenue for investigating the exploration-exploitation trade-off.