Machine learning methods, and here in particular deep neural networks, have led to dramatic improvements in many areas. We complement these data-driven methods with model-based approaches from statistical signal processing to solve diverse speech and audio signal processing tasks in innovative ways.
Spoken language is the most important communication medium for humans, especially to (tele)communicate over a distance. In addition, speech is increasingly used for communication with machines. For this to work reliably, flexibly and robustly, the recorded speech signal must be freed from external influences. The term speech signal enhancement covers methods for noise suppression, dereverberation or separation speech mixtures into the speech of the participating speakers. Conversely, automatic speech recognition is understood to mean its transcription, i.e., the conversion of the acoustic signal into a machine readable form understood by a computer. We are active in all these areas, often in cooperation with well-known international companies. A special feature of our research is that we combine machine learning methods with classical methods of statistical signal processing to arrive at more robust, energy-efficient, and explainable solutions than would be possible with purely data-driven machine learning methods.
Speech is a fascinating signal, because, in addition to the content, i.e., information about what is being spoken, it also contains a great deal of information about who is speaking and in what environment. Phonetics research investigates, among other things, which acoustic features are used to convey certain para- and extralinguistic information that provides information about the state of the speaker and the environment. We believe that these research questions can be investigated in novel ways using the capabilities of current machine learning techniques to manipulate speech signals in a targeted manner. To this end, we are collaborating with phoneticians at the University of Bielefeld in a project funded by DFG, and within the Transregio TRR 318 "Constructing Explainability".
In our daily lives we are surrounded by a multitude of sounds and other acoustic signals. Often unconsciously, we evaluate these signals to form an idea about the environment and the activities in it. A technical system with similar capabilities would have a wide range of applications, such as for assistance systems, intelligent control systems or to support ambience perception in autonomous driving. Together with colleagues from other German universities, we are researching so-called acoustic sensor networks as part of a DFG research group, which record, clean up and classify acoustic signals via distributed sensor nodes in order to realize the above applications.