On State Representations and Behavioural Modelling Methods in Reinforcement Learning

Siljebråt, Henrik. 2023. On State Representations and Behavioural Modelling Methods in Reinforcement Learning. Doctoral thesis, Goldsmiths, University of London [Thesis]

[img]
Preview
Text (On State Representations and Behavioural Modelling Methods in Reinforcement Learning)
COM_thesis_SiljebratH_2023.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (18MB) | Preview

Abstract or Description

Reinforcement learning (RL) – algorithms for learning from rewards – has proved successful in the cognitive sciences for explaining both neuronal signals and behaviour in animals, and for producing impressive results in artificial intelligence.

Essential to RL models are state representations. Based on what current state an animal or artificial agent is in, they learn optimal actions by maximizing future expected reward. But how are humans able to learn and create representations of states?

This thesis approaches this question from two fronts. First, we thoroughly investigate methods for fitting behavioural models to human lab data. In contrast to recent proposals, we find that the best methods for model selection – determining what model most likely generated some data – are based on maximum likelihood estimation, rather than Bayesian inference. We also demonstrate the importance of considering individual differences in model fitting: the model which best fits the performance of one participant may not fit the behaviour of another participant.

Second, we introduce Shapetask – a novel learning and decisionmaking task where participants must find hidden structure in a sequence, without the task explicitly rewarding the appropriate actions. We show that some humans can find this pattern, while RL cannot, unless equipped with appropriate state representations. We then show how previously proposed models that integrate RL with complex state representations can account for individual human behaviour in the Shapetask.

We argue our results add to the growing literature indicating a broader role for dopamine as one involving general sensory prediction errors, not just reward prediction errors. Further, we argue Shapetask holds promise for use in further research on the topic of state representation and task structure. Such research may illuminate the workings of animal brains, and contribute to artificial intelligence, where enhanced models of state representations could improve data efficiency and generalisability over current generation systems.

Item Type:

Thesis (Doctoral)

Identification Number (DOI):

https://doi.org/10.25602/GOLD.00033671

Keywords:

animals, learning, decision making, reinforcement learning, state representations, representation learning, computer science, psychology, artificial intelligence, neuroscience, computational neuroscience, behavioural modelling, methods, statistical modelling, model comparison

Departments, Centres and Research Units:

Computing

Date:

30 April 2023

Item ID:

33671

Date Deposited:

20 Jun 2023 11:39

Last Modified:

26 Jun 2023 14:52

URI:

https://research.gold.ac.uk/id/eprint/33671

View statistics for this item...

Edit Record Edit Record (login required)