We provide an algorithmic solution to a nonstationary bandit problem. In it, reward-generating processes are i) unknown-still-knowable (ambiguity); ii) affected by sudden changes (”unexpected uncertainty”). This solution begets a Forgetting Bayes Learning approach – which learns the arm values dynamically, by forgetting past data once a jump has occurred, along with a softmax (logit) rule – which chooses between the arms in each trial, on the basis of their estimated values. The ensuing “Forgetting Bayes algorithm” is tractable and fit to maximize the earnings accumulated in the task. When tested in simulated data, it outperformed a benchmark that hinges on Hierarchical Bayes Learning, let alone dominated model-free reinforcement learning solutions, which cannot track jumps. Beyond being highly adaptive, our Forgetting Bayes algorithm has plausible neurophysiological foundations, suggesting it may be the most efficient protocol available to humans.
