Has any work been do on networks that can generate their own ?

The thought would be, humans can improve at games like chess without being told explicitly what moves are. Someone can play lots of games and start to lable certain ideas as or bad. Maybe they notice that having center control leads to more wins, and moving the same price many times leads to losses. With this in mind I am wondering if this type of idea has implemented before, learning off of only wins and losses. If so, can this be extended to learning that isn’t associateed with games?

