Logo

    multi-armed bandit

    Explore " multi-armed bandit" with insightful episodes like and "Multi-Armed Bandits: Learning better decisions" from podcasts like " and "DataCafé"" and more!

    Episodes (1)

    Multi-Armed Bandits: Learning better decisions

    Multi-Armed Bandits: Learning better decisions

    Have you ever wondered why you keep getting adverts for products that you've only just bought and now don't need? The online advert auto-server is probably using a multi-armed bandit learner that needs a little algorithmic improvement. We speak to Ciara Pike-Burke about her work on trying to make multi-armed bandits smarter and more useful.

    The multi-armed bandit problem is a classic reinforcement learning problem where we are given a slot machine with n arms (bandits) with each arm having its own rigged probability distribution of success. Pulling any one of the arms might give you a stochastic reward of either R=+1 for success, or R=0 for failure. Our objective is to pull the arms one-by-one in sequence such that we maximize our total reward collected in the long run. In the world of data science, the bandit arms are possible decisions that can be taken, the reward is the possible win you get from taking a decision and the uncertainty in the problem is what makes this hard (and exciting!).

    With interview guest Dr. Ciara Pike-Burke from the Universitat Pompeu Fabra (Barcelona)
    https://sites.google.com/view/cpikeburke

    Further Reading

    Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

    Recording date: 29 Jan. 2020
    Interview date: 10 Jan. 2020

    Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.