Podcast summary and Notes on " multi-armed bandit"

Multi-Armed Bandits: Learning better decisions

Have you ever wondered why you keep getting adverts for products that you've only just bought and now don't need? The online advert auto-server is probably using a multi-armed bandit learner that needs a little algorithmic improvement. We speak to Ciara Pike-Burke about her work on trying to make multi-armed bandits smarter and more useful.

The multi-armed bandit problem is a classic reinforcement learning problem where we are given a slot machine with n arms (bandits) with each arm having its own rigged probability distribution of success. Pulling any one of the arms might give you a stochastic reward of either R=+1 for success, or R=0 for failure. Our objective is to pull the arms one-by-one in sequence such that we maximize our total reward collected in the long run. In the world of data science, the bandit arms are possible decisions that can be taken, the reward is the possible win you get from taking a decision and the uncertainty in the problem is what makes this hard (and exciting!).

With interview guest Dr. Ciara Pike-Burke from the Universitat Pompeu Fabra (Barcelona)
https://sites.google.com/view/cpikeburke

Further Reading

Ciara Pike-Burke's list of papers (via dblp)
Paper: Recovering Bandits (via arXiv)
Article: Solving the Multi-Armed Bandit Problem (via Anson Wong and towardsdatascience)
Course: Reinforcement learning (via University of Alberta and coursera)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 29 Jan. 2020
Interview date: 10 Jan. 2020

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

DataCafé

en-gbJune 01, 2020

reinforcement learning

On this page

multi-armed bandit

Episodes (1)

Multi-Armed Bandits: Learning better decisions