Hi, I was wondering if an appropriate way to account for in NN is to evaluate the network with several random seeds (like ) on a public and then produce the average and confidence intervals based on the .

Also, I’m having a hard time trying to figure out how to determine if my model’s test set results are statistically significant in comparison to other results. How would you do this? Especially if the results are based on a single random seed or the maximum over several random seeds (or even the average over a certain number of random seeds, though I’ve never seen this done before).

