“Source Separation” is a wide research area ranging from decomposition of EEG signals to telecommunication channels. We will here focus on speech signals and mention the obligatory cocktail party problem:
Imagine a cocktail party where different speakers speak simultaneously. A keen listener may try to unterstand a single speaker but struggles with interfering speakers and noise.
Our past research projects aimed at solving this by leveraging phase and level differences in a model based clustering approach to estimate spectral masks for each target speaker. This approach has shown promissing results on the fifth CHiME challenge. However, it became apparent that this approach is limited due to the lack of knowledge from an extensive database and not using speech characteristics.
A more recent outcome is a discriminatively trained neural network to provide spectral masks to estimate beamforming coefficients in a multi-channel setting. Motivated by its efficacy demonstrated during the third and fourth CHiME challenge, our current research focusses on:
- Discriminatively trained spectral masking,
- Synergy of model based source separation with neural networks,
- Source separation as a front-end for automatic speech recognition.