So some of you might have been following the twitter debate between Gary Marcus and Tom Dietterich, after Gary’s “Deep Learning: A Critical Appraisal” came out last week. (r/ML discussion on Gary’s manuscript)

In Tom’s excellent rebuttal (, I came along a line that I did not quite understand: “Training using overcomes of

How does REINFORCE do so? Could someone point me to a reference where this is explicitly being used?

