RL-Lec11 基于Q函数的深度强化学习
Deep Reinforcement Learning with Q-Functions
Q-Learning
回顾Q-Learning算法的更新公式为
$q(s_t,a_t)\leftarrow q(s_t,a_t)+\alpha\big(R_{t+1}+\gamma\max\limits_{a'\in\mathcal A}q(s_{t+1},a')-q(s_t,a_t)\big).$
而在行动值函数估计中,我们将Q函数进行参数化 $\hat{q}(s,a,\pmb w...