Spectral Denoising

Denoising of signals distorted by an additive nonstationary noise is of great interest in many applications even beyond speech signal processing. Since a time-frequency representation of speech signals is sparse, an enhancement  of the short-time Fourier transform (STFT) coefficients of a single-channel noisy signal is often used to remove noise. Such spectral speech enhancement systems usually work in two domains, first, in the power spectral density (PSD) domain to estimate a PSD of the noise signal and, second, in the signal-to-noise ratio (SNR) domain to calculate a spectral gain function, which is used to enhance the noisy STFT coeffitients. Since clean speech and noise signals can be satisfactorily modelled as two independent random processes, statistical model-based approaches can be developed for denoising.

Noise PSD estimation

Estimation of spectral statistics of the noise signal is a challenging task especially in the presence of nonstationary noise. Considering that the signal processing in the SNR domain depends on the quality of the noise PSD estimation, robust and accurate estimators are required here.

Spectral enhancement

From the noise PSD estimate, a so called a posteriori SNR can be calculated, which acts as an input value of the signal processing in the SNR domain. Three different modules can be applyed in the SNR domain for the spectral magnitude denoising:

  • a priori SNR estimation,
  • speech presence probability estimation,
  • calculation of a spectral gain function.