I've been using machine learning on a side project of mine where I train an algorithm to play a video game just to see how well it works. The Neural Network decides what actions to perform in-game, then evaluates the success of those actions. Then it stores a snapshot of the scenario it was in at the time of executing the action, and a value that indicates the success level of that action. This forms the basis of my training set.
I've noticed something happen that I can only explain by calling it "learned constriction". What I mean by this is that once the NN learns that something doesn't work in a given scenario enough times it'll just never try it again. This game has a lot of randomness elements to it, meaning the data produced will be quite noisy. Rather than just slowing down the learning rate of the NN, I would like to implement something to make it experiment a bit with what actions it tries to perform.
I'm wondering if there's something like "standard procedure" for this type of scenario to combat incorrect modeling of noisy data in an online learning model. Is simple occasional random selection of what action to perform enough? Ideally I would be able to measure its in-game performance while also experimenting, but that's the best case scenario.