The thought would be, humans can improve at games like chess without being told explicitly what good moves are. Someone can play lots of games and start to lable certain ideas as good or bad. Maybe they notice that having center control leads to more wins, and moving the same price many times leads to losses. With this in mind I am wondering if this type of idea has implemented before, learning off of only wins and losses. If so, can this be extended to learning that isn’t associateed with games?