We use mini- statistics during train, and use population statistics during test (which using some kind of approximation like exponential averages).

In case of small mini-batch, a mini-batch statistics seems to be a poor choice.

I can only wonder why we don’t use a kind of exponential average more during ?



Source link
thanks you RSS link
( https://www.reddit.com/r//comments/9d37rc/d_why__we_use__statistics_for_batch/)

LEAVE A REPLY

Please enter your comment!
Please enter your name here