Reinforcement Learning Force Tube - Row-Major (Taahir Ahmed)

An interactive demonstration of some of the techniques described in Sutton and Barto's Reinforcement Learning: An Introduction (the in-progress second edition).

The specific algorithm implemented is "Differential Semi-Gradient Sarsa for Control", found in Section 10.3.

Timescale

The current action-value functions

Policy Update Frequency

Reward history