Hey Everyone, I got a question:

I am trying to test recognition using a CNN + BLSTMP network (https://github.com/espnet/espnet). The result is comparable with that using only BLSMTP. However, when I tried to add to the CNN layers, the performance becomes worst. I already tried implementing a NN with similar characteristics but for another sequence application and improves the result even when the decode batchsize is 1. Has anyone tried in CNN for sequence .? Is working that well? is that depend on the framework (TF, Chainer or Pytorch) or BN does not work well for Speech in sequence (will need ref) but it will be better to implement a normalization along the sequence?


Source link
thanks you RSS link
( https://www.reddit.com/r//comments/8xsv9q/d_speech__training_and_bn/)


Please enter your comment!
Please enter your name here