Noise PSD estimation

The main task of a noise PSD tracker ist estimation of the current noise PSD alone from the instantaneous STFT magnitudes of noisy signal. The challenging task of noise PSD estimation in the presence of non-stationary noise has spurred the development of many sophisticated algorithms during the last years. A closer look into their functionality  shows that they can be categorized along the following five techniques:

  • Because of the sparseness of the clean speech PSD some noise trackers make use of a minimum search (Min.-Search) of the noisy PSD over a certain number of the previous frames, which are closely related to the desired noise PSD estimate.
  • Other approaches employ a voice activity detection (VAD) or a speech presence probability (SPP) estimation, again exploiting the sparseness of speech,  to find the noise-only time-frequency slots where the noise PSD estimate can be updated.
  • Due to the random nature of the signals, they are often modeled as realizations of  random processes with given probability density function (PDF) enabling e.g. an analytical bias compensation (Bias-Comp.) of the noise PSD estimates.
  • Furthermore, the statistical modeling facilitates a Bayesian inference (Bayes-Inf.) such as the minimum mean squared error (MMSE) estimators for noise PSD.
  • Since the STFT coefficients of the noise signals are correlated in a certain neighbourhood even for white noise, an output smoothing (Out.-Smooth.) becomes another very popular technique in the noise tracking.

A mandatory property of all approaches for noise PSD estimation, when used in communication scenarios, is its causality.

Techniques used in state-of-the-art noise PSD estimators
GroupMin.-SearchVAD-SPPBais-Comp.Bayes-Inf.Out.-Smooth.
1. MS, BSMS      x       x
2. VAD-RA, SPP-FT         x            x
3. MCRA-based      x         x            x
4. IMCRA      x         x       x            x
5. MMSE-SPP         x       x            x
6. MMSE-BM      x       x       x            x

Developing of robust causal noise PSD trackers is still a challenging and exciting task in modern research.

State-of-the-art noise PSD trackers

A very popular noise PSD tracker is the minimum statistics (MS) approach. The MS method implements a minimum search with previous averaging of the noisy PSD over time with a time-variant optimal smoothing constant and an elaborated bias compensation. Recently we proposed to use an alternative control function for calculation of the optimal smoothing constant resulting in the Bayesian-smoothed MS (BSMS) approach, which improves the intelligibility of enhanced speech signal.

Another noise PSD estimator denoted often as a VAD recursive averaging (VAD-RA) approach, applies an output smoothing of the noisy PSD controlled by a rough VAD estimation which indicates speech presence. Compared to the MS approach the noise PSD trajectories of the VAD-RA approach are more smoothed. The same techniques are used by a SPP-based approach with fixed priors (SPP-FP), where the authors propose to replace the hard decision of the VAD by a soft SPP estimation resulting in an unbiased minimum mean squared error (MMSE) inspired estimator.

In contrast to SPP-FP approach, the output smoothing of the minima controlled recursive averaging (MCRA) algorithm is controlled by a SPP estimation, which is based on a previous minimum search technique. The MCRA method served as a corner stone for the development of a series of further noise PSD trackers. One of them, the enhanced MCRA (EMCRA) approach, aims to reduce the estimator's delayed response to an abrupt noise rise and to mitigate the speech leakage into the noise PSD estimates. For the SPP estimation to benefit from inter-frame correlations of the speech signal, it was proposed to incorporate a first-order conditional maximum a posteriori (MAP) criterion into the MCRA noise tracker resulting in the MCRA-MAP approach.

Another well-known MCRA-based noise PSD tracker developed by the author of the MCRA method is an improved MCRA (IMCRA) approach, which upgrades the \textit{minimum tracking} in speech activity and the SPP estimation of the MCRA noise tracker. Further IMCRA method implements a sophisticated bias compensation not available in the MCRA approach.

Using Bayesian inference for the estimation of the noise PSD estimate is a particular attribute of the two further MMSE-based approaches, which also make use of the output smoothing technique. Although both approaches use the same estimation rule, they embed it in the estimation procedure in different ways. While first named further as MMSE-VAD applies the MMSE estimator only for time-frequency bins without speech activity (as a VAD-like estimation), the second called MMSE-BM implements a bias compensation and a minimum search techniques. The last technique serves in MMSE-BM approach to realize a so called safety-net method for overcoming a locking of algorithm.

Novel DNN-based noise PSD estimator

In recent years deep neural networks (DNN) have made inroads in  speech signal processing, and DNN-based approaches for speech enhancement have been developed. Sometimes DNNs are combined with conventional speech enhancement techniques. To achieve a robust noise PSD estimation we suggested to use a causal single-channel DNN-based noise-only presence probability (NPP) estimator for the output smoothing technique mentioned before. The proposed DNN-based noise spectral masks like depicted in the figure below led to a superiority of the proposed DNN-based noise PSD estimator over ten conventional approaches described before.