Logo

    Give that model a treat! : Reinforcement learning explained

    enJuly 22, 2020
    What was the main topic of the podcast episode?
    Summarise the key points discussed in the episode?
    Were there any notable quotes or insights from the speakers?
    Which popular books were mentioned in this episode?
    Were there any points particularly controversial or thought-provoking discussed in the episode?
    Were any current events or trending topics addressed in the episode?

    About this Episode

    Switching gears, we focus on how Yannick’s been training his model using reinforcement learning.  He explains the differences from David’s supervised learning approach. We find out how his system performs against a player that makes random tic-tac-toe moves.

    Resources: 

    Deep Learning for JavaScript book

    Playing Atari with Deep Reinforcement Learning

    Two Minute Papers episode on Atari DQN

    For more information about the show, check out pair.withgoogle.com/thehardway/.


    You can reach out to the hosts on Twitter: @dweinberger and @tafsiri


    Recent Episodes from Tic-Tac-Toe the Hard Way

    Head to Head: The Even Bigger ML Smackdown!

    Head to Head: The Even Bigger ML Smackdown!

    Yannick and David’s systems play against each other in 500 games. Who’s going to win? And what can we learn about how the ML may be working by thinking about the results?

    See the agents play each other in Tic-Tac-Two!


    For more information about the show, check out pair.withgoogle.com/thehardway/.


    You can reach out to the hosts on Twitter: @dweinberger and @tafsiri

    Enter tic-tac-two

    Enter tic-tac-two

    David’s variant of tic-tac-toe that we’re calling tic-tac-two is only slightly different but turns out to be far more complex. This requires rethinking what the ML system will need in order to learn how to play, and  how to represent that data.

    For more information about the show, check out pair.withgoogle.com/thehardway/.


    You can reach out to the hosts on Twitter: @dweinberger and @tafsiri

    Give that model a treat! : Reinforcement learning explained

    Give that model a treat! : Reinforcement learning explained

    Switching gears, we focus on how Yannick’s been training his model using reinforcement learning.  He explains the differences from David’s supervised learning approach. We find out how his system performs against a player that makes random tic-tac-toe moves.

    Resources: 

    Deep Learning for JavaScript book

    Playing Atari with Deep Reinforcement Learning

    Two Minute Papers episode on Atari DQN

    For more information about the show, check out pair.withgoogle.com/thehardway/.


    You can reach out to the hosts on Twitter: @dweinberger and @tafsiri


    Beating random: What it means to have trained a model

    Beating random: What it means to have trained a model

    David did it! He trained a machine learning model to play tic-tac-toe! (Well, with lots of help from Yannick.) How did the whole training experience go? How do you tell how training went? How did his model do against a player that makes random tic-tac-toe moves?

    For more information about the show, check out pair.withgoogle.com/thehardway/.


    You can reach out to the hosts on Twitter: @dweinberger and @tafsiri

    From tic-tac-toe moves to ML model

    From tic-tac-toe moves to ML model

    Once we have the data we need—thousands of sample games--how do we turn it into something the ML can train itself on? That means understanding how training works, and what a model is.

    Resources:
    See a definition of one-hot encoding

    For more information about the show, check out pair.withgoogle.com/thehardway.


    You can reach out to the hosts on Twitter: @dweinberger and @tafsiri

    What does a tic-tac-toe board look like to machine learning?

    What does a tic-tac-toe board look like to machine learning?

    How should David represent the data needed to train his machine learning system? What does a tic-tac-toe board “look” like to ML? Should he train it on games or on individual boards? How does this decision affect how and how well the machine will learn to play? Plus, an intro to reinforcement learning, the approach Yannick will be taking.

    For more information about the show, check out pair.withgoogle.com/thehardway.


    You can reach out to the hosts on Twitter: @dweinberger and @tafsiri

    Howdy, and the myth of “pouring in data”

    Howdy, and the myth of “pouring in data”

    Welcome to the podcast! We’re Yannick and David, a software engineer and a non-technical writer. Over the next 9 episodes we’re going to use two different approaches to build machine learning systems that play two versions of tic-tac-toe. Building a machine learning app requires humans making a lot of decisions. We start by agreeing that David will use a “supervised learning” approach while Yannick will go with “reinforcement learning.”

    For more information about the show, check out pair.withgoogle.com/thehardway


    You can reach out to the hosts on Twitter: @dweinberger and @tafsiri