Title: ClariNet: in End-to-End Text-to-Speech

Authors: Wei Ping, Kainan Peng, Jitong Chen

Abstract: In this work, we propose an alternative solution for parallel wave generation by WaveNet. In contrast to parallel WaveNet (Oord et al., ), we distill a Gaussian inverse autoregressive flow from the autoregressive WaveNet by minimizing a novel regularized KL divergence between their highly-peaked output distributions. Our method computes the KL divergence in closed-form, which simplifies the algorithm and provides very efficient distillation. In addition, we propose the first text-to-wave neural architecture for speech synthesis, which is fully convolutional and enables fast end-to-end from scratch. It significantly outperforms the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet (Ping et al., 2017). We also successfully distill a parallel waveform synthesizer conditioned on the hidden representation in this end-to- end model.

PDF link Landing page

Source link
thanks you RSS link
( https://www.reddit.com/r//comments/90b9k3/r_clarinet_parallel_wave_generation_in_endtoend/)


Please enter your comment!
Please enter your name here