We use mini- during train, and use population during test (which using some kind of approximation like exponential averages).

In case of small mini-batch, a mini-batch statistics seems to be a poor choice.

I can only wonder why we don’t use a kind of exponential average more during training?

Source link
thanks you RSS link
( https://www.reddit.com/r//comments/9d37rc/d_why__we_use__statistics_for_batch/)


Please enter your comment!
Please enter your name here