Hey Everyone, I got a question:
I am trying to test speech recognition using a CNN + BLSTMP network (https://github.com/espnet/espnet). The result is comparable with that using only BLSMTP. However, when I tried to add BN to the CNN layers, the performance becomes worst. I already tried implementing a NN with similar characteristics but for another sequence application and BN improves the result even when the decode batchsize is 1. Has anyone tried BN in CNN for sequence training.? Is working that well? is that depend on the framework (TF, Chainer or Pytorch) or BN does not work well for Speech in sequence (will need ref) but it will be better to implement a normalization along the sequence?