Abstract:
Audio source separation is a challenging problem which consists of identifying the different sources present in a mixed signal, either by using traditional model based methods or using deep learning algorithms. In this work, we propose two different paradigms for combining model based methods (nonnegative matrix factorization) with neural networks to take advantage of both. The first approach fuses the NMF and a deep neural network (DNN) in a two sequential stages stack, where the DNN enhances the separation of the signals by updating the spectrograms/gains that were estimated using the NMF.
Two architectures based on autoencoders are presented in this thesis, that handle two different kind of input data. The second approach is based on the deep unfolding paradigm. It consists of unrolling the optimization algorithm of the model based method into layers of a deep network, and train it using deep learning techniques.