Abstract: Learning representations of data in an unsupervised way is still an open problem of machine learning. We consider representations of speech learned using autoencoders equipped with WaveNet decoders. In this way, the encoder only needs to provide the little information needed to supplement all that can be inferred by the autoregressive decoder. This allows learning a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise.
I will show how the design choices of the autoencoder, such as the bottleneck kind its hyperparameters impact the induced latent representation. I will also show applications to unsupervised acoustic unit discovery on the ZeroSpeech task.
Bio: Jan Chorowski is an Associate Professor at Faculty of Mathematics and Computer Science at the University of Wrocław. He received his M.Sc. degree in electrical engineering from the Wrocław University of Technology, Poland and EE PhD from the University of Louisville, Kentucky in 2012. He has worked with several research teams, including Google Brain, Microsoft Researchand Yoshua Bengio’s lab at the University of Montreal. His research interests are applications of neural networks to problems which are intuitive and easy for humans and difficult for machines, such as speech and natural language processing.