If I understand correctly the `` in OpenAI-baselines computes a running average of the observations and immediate rewards, however, it is rather different than discounted for estimation, i.e. `-log p(a|s)*R’` where `R’ = (R – R.mean())/R.std()`



Source link
thanks you RSS link
( https://www.reddit.com/r//comments/90edng/d_how_different_is__vecnormalize/)

LEAVE A REPLY

Please enter your comment!
Please enter your name here