So some of you might have been following the twitter debate between Gary Marcus and Tom Dietterich, after Gary217;s 220;Deep Learning: A Critical Appraisal221; came out last week. (r/ML discussion on Gary’s manuscript)
In Tom’s excellent rebuttal (https://twitter.com/tdietterich/status/950053197179109377), I came along a line that I did not quite understand: “Training using REINFORCE overcomes limits of fixed depth BPTT”
How does REINFORCE do so? Could someone point me to a reference where this is explicitly being used?